Population growth and urbanization have pressured terrestrial landscapes, increasing land utilization to meet socioeconomic needs (Bowler et al. 2020; FAO 2017). As a result, agricultural production follows unsustainable practices that focus on enhancing the output per unit of land area. These practices may fail to achieve the intended purpose but drive the continuous impact on the environment as food production and ecosystem functions exhibit some form of interdependent relationship (Pellikka et al. 2013). The situation is even worse with the anticipation of 2.5 billion people being added to our planet by mid-century. Thus, the global demand for food will increase significantly, inducing anti-environmental effects (Tilman et al. 2011).

Globally, agricultural production demand is central to LULCCs on the Earth's surface. These changes involve transformations within and between various land uses. The most widespread form of LULCCs relates to cropland expansion. This transformation is often accompanied by losses in forestlands, grasslands, wetlands, and other features of ecological importance (Lark et al. 2020; Zeng et al. 2018). Empirical evidence suggests that human actions are central to LULCCs (Mwaniki and Möller 2015). These changes vary across diverse spatial scales and magnitudes based on underlying biophysical and climatic conditions. Globally, cropland expansion resulting from LULCCs has been associated with the growing population, poorly formulated government action plans, environmental influences, and technological advancements (Hassan et al. 2016; Jellason et al. 2021; Kindu et al. 2015; Nakalembe et al. 2017; Pham and Smith 2014; Winkler et al. 2021).

In developing regions, cropland expansion portrays similar patterns and trends. Underlying this fact are the common challenges faced by smallholder farmers, who are the primary players in the food production chain in these regions. The challenges stem from an interplay of production factors such as land, income, market access, and prevailing climate conditions (Giller et al. 2021). Different from developed regions, unsustainable land-use practices such as charcoal burning, illegal encroachments, overgrazing, and relaxed enforcement of the law encompass the prevalent drivers of forestland, grassland, and wetland losses (Baldyga et al. 2008; Ewane 2021; Mwangi et al. 2020; Nakalembe et al. 2017). Consequently, these losses induce massive cropland conversions that have severe implications for ecosystem service provision (Song and Deng 2017), hydrological balances (Baldyga et al. 2008), and food production (Hoque et al. 2020). Therefore, understanding LULCCs and intrinsic drivers is a step towards developing tenable and coherent landscape practices that drive sustainable agricultural production (Kindu et al. 2015).

Regular and up-to-date information on land use dynamics and cropland expansion is required to formulate sound policies that foster sustainable human-environmental interactions. Moreover, information on the drivers of cropland expansion is paramount to offering precise and timely solutions to land-use decisions and regulatory measures. Remote sensing information combined with geospatial approaches provides the most feasible, cost-effective way of obtaining cropland expansion dynamics. The technology thus helps to address the issue of data limitation, especially in the data-sparse environments in developing regions. Kenya has experienced rapid conversions of natural ecosystems to croplands in the recent past (Bullock et al. 2021). The expansions challenge the ecosystems' provision capacities and expose the land to degradation, soil erosion, and biodiversity loss (Mulinge et al. 2016). The expansion in croplands is gradual in high potential agricultural production zones (Kogo et al. 2021). Consequently, it poses a threat to the sustainability of agricultural production, given that only 12% of Kenya's land mass falls under the high potential zones for production (Kabubo and Karanja 2007). However, the effects of various drivers on cropland expansion in these zones remain uncertain, and comprehensive analysis has been lacking to date.

In recent years, various modelling approaches involving qualitative and quantitative data analysis have gained prominence in assessing drivers of cropland expansion. These approaches integrate remote sensing information and geospatial analysis that allows explicit assessments of LULCCs. Some studies, for instance, used linear and spatial regression to assess the drivers of deforestation and agricultural expansion. For example, de Espindola et al. (2021) combined satellite information and variables related to proximity, land management, technological resources, and environmental variables to assess drivers of LULCCs in the Amazon basin. Mwangi et al. (2020) combined boosted regression trees and geographically weighted regression to determine the significance and model the spatial influence of the drivers of LULCCs in Central Kenya. Nevertheless, in Kenya, Were et al. (2014) employed a logistic regression approach to uncover the drivers of LULCCs in Kenya-Afromontane forest environments. Other studies have utilized machine learning approaches such as random forest (RF) classification to determine and evaluate the importance of various drivers of LULCCs in the northeastern United States of America (Zhai et al. 2020). Other studies combined qualitative and quantitative data analysis, such as the study of Kindu et al. (2015), who evaluated drivers of LULCCs in Ethiopia. Moreover, Munthali et al. (2019) combined qualitative data analysis and geographic information systems (GIS)-based processing to assess the drivers of LULCCs in Malawi.

The reviewed studies modelled observed LULCCs changes derived through the analysis of remotely sensed imagery as a function of socioeconomic and biophysical attributes of the landscape. Subsequently, they linked the geographical distribution of land-use transitions to ancillary data to establish the significant drivers and uncover the underlying reasons for the observed patterns. Although their applications have been successful in LULCC studies, the use of evidence belief functions to assess drivers of LULCCs remains limited. Furthermore, multiapplication assessment synthesizes the inherent strengths of the individual approaches. Therefore, this study combined logistic regression (LR), boosted regression trees (BRTs), and evidence belief functions (EBFs) to assess the drivers of cropland expansion in Trans Nzoia County. Campbell et al. (2005) concluded that complexities in LULCC processes, especially their linkages with social, ecological, economic, and institutional contexts, require multiple approaches to disentangle the drivers of LULCCs.

The present study thus complements the literature in the following ways. First, three modelling techniques were applied to assess the accuracies of cropland expansions and the underlying processes. Second, the spatial prediction was conducted to depict varying probabilities of cropland expansion across the study area. Finally, spatially modelled raster surfaces were used to enhance the definition of proximity variables by combining cost functions and linear network analysis in a geospatial environment. In this way, a more realistic measure of proximity is defined as opposed to the Euclidean and buffering approaches common in past studies (Sarkar and Chouhan 2020). Thus, this study aimed to achieve the following objectives:

  1. 1.

    To assess LULC changes in Trans Nzoia county and their contributions to cropland expansions.

  2. 2.

    To analyse the key drivers of cropland expansion using LR, BRTs, and the EBFs.

  3. 3.

    To assess the approaches for usability and the quality and accuracies in predicting cropland expansions at a county scale.

Materials and methods

Study area

This study was conducted in Trans Nzoia County, situated in the western part of Kenya and bordering Uganda to the west (Fig. 1). Agriculture is the main economic activity characterized by both small- and large-scale farming. Small-scale farmers cultivate crops such as maize, beans, potatoes, and sorghum, while large-scale farmers focus on producing wheat, tea, and sugarcane (Mwaura and Kenduiywo 2021). Livestock keeping, poultry rearing, fishing, and apiculture are practised for subsistence and commercial purposes. Climatically, the county exhibits a bimodal rainfall pattern. The long and short rainy seasons occur between March and May and October and December, respectively. The average annual precipitation is approximately 1300 mm, while the mean annual minimum and maximum temperatures are 12 °C and 26 °C, respectively (Nyberg et al. 2020). The county hosts Mt. Elgon and Cherangany forest ecosystems, part of Kenya's prominent water towers (Langat 2018). These ecosystems are catchments for the Nzoia and Suam rivers, which drain their waters into Lake Victoria and Turkana. The county population is approximately 990,000 people, according to the 2019 Kenya population and housing census (KNBS 2019). Trans Nzoia County was selected for this study due to its leading role as the country's central food basket. In addition, recent substantial LULCCs in the region pose a serious challenge to food security and environmental sustainability.

Fig. 1
figure 1

Location of the study area and bordering counties (a), the context of Kenya in Africa (b), and the context of Trans Nzoia County in Kenya (c)

Data and data sources

This study used various datasets generated through primary and secondary data surveys, including archived remote sensing images, existing GIS databases, field observations, and discussions with land-use experts. The primary data collection was conducted between May and September 2021. The data collected during this period include ground observations used in the training and validation process of the RS image classification. In addition, land management experts from the Trans Nzoia County lands department provided information about land-based transformations and potential drivers.

The third set of data was sourced from secondary databases. The obtained variables include soil physical and chemical properties, precipitation, temperature, population density, accessibility to water sources, distance to major roads, and proximity to major trading centres. Soil information data was obtained from the International Soil Reference and Information Centre (ISRIC),; precipitation and temperature variables were sourced from the climatology laboratory of the University of California,; The population density data were obtained from the Gridded Population of the World (GPW) Version 4 of the Socioeconomic Data and Applications Centre,; Road network data was obtained from the Humanitarian Data Exchange of the United Nations,; Rivers data was sourced from the World Resources Institute (WRI),; Market centres data was obtained from the Trans Nzoia County Department of Finance and Economic Planning, The proximity to roads, market centres, and rivers was modelled into raster surfaces using ArcGIS's cost distance functions. The detailed procedure for preparing proximity variables is outlined in the online resource of this article (ESM_1). In addition, the raster maps of all the variables used in this study are accessible from the online resource of this article (ESM_1).

The RS satellite images were acquired from different Landsat sensors, including thematic mapper (TM), enhanced thematic mapper (ETM+), and operational land imager/thermal infrared sensor (OLI/TIRS). The data was processed within the Google Earth Engine (GEE) platform, but the individual scenes are available from Three sets of Landsat images from 1990, 2005, and 2020 provided the data for mapping LULCCs. The spectral bands used in this study include the blue, red, NIR, and SWIR bands, with spatial resolutions of 30 m. The processed images were acquired during a relatively dry season between November and March of the succeeding year. The period allowed for the best comparison assessments across the time epochs, as the phenologies of the land features appear relatively similar. Table 1 outlines the sources, descriptions, and purposes of both the primary and secondary data used in modelling.

Table 1 Overview of datasets used, sources, and their purpose in the study

Image processing and LULC classification

The study used the GEE cloud computing environment to process the Landsat images and generate a time series of land cover maps for three epochs: 1990, 2005, and 2020. The platform permits large-scale data computing, thus minimising the tedious data downloading and storage requirements (Gorelick et al. 2017). Accordingly, surface reflectance data products were derived for the three epochs. The multitemporal products have already been preprocessed for radiometric and geometric corrections, and the products have also been corrected for absorbing and scattering gases and aerosol atmospheric effects. Therefore, the study used level 2 surface reflectance data corrected for radiometric and geometric defects. The cloud cover threshold was set at 20% to minimize the effect of clouds on the images. Any clouds present in the selected images were masked and replaced with pixels of images in the Landsat archive within 60 days of the acquisition date. The cloud score algorithm was used to mask pixels with high cloud cover based on a Landsat quality image file by computing a cloud-likelihood score from 0 (no clouds) to 100 (most cloudy). Finally, the normalised difference vegetation index, red, blue, NIR, and SWIR bands, were used as input features for the land use and land cover (LULC) classifications.

Supervised classification was performed using the random forest (RF) classifier (Breiman 2001) to map various LULC classes in the region. The algorithm is a nonparametric algorithm used for ensemble learning. It solves classification problems by estimating multiple decision trees from the training datasets and assigns probable class values to the pixels based on the maximum vote of the decision trees. The classifier is robust, achieves high accuracy, and effectively handles outliers and noisier datasets compared to other image classifiers (Belgiu and Drăguţ 2016). In this study, the number of decision trees was set at 500 to achieve a good balance between classification speed and accuracy (Belgiu and Drăguţ 2016). The default values were chosen for variablesPerSplit (√(n_bands)), and the fraction of the input to bag per tree was set to 0.5.

The classification was conducted based on existing LULC classes in the study area, which include croplands, forestlands, wetlands, grasslands, built-up areas, and other lands. The other land category comprises barren lands, unclassified areas, and other exposed surfaces that do not fall in the former LULC categories. The training and validation datasets were collected from field surveys, existing topographical maps, documented historical land use plans, the local knowledge of the two authors, and visual interpretation of high-resolution imagery derived from Google Earth. The period of study determined the training and validation data used. For instance, historical information was used for the 1990 and 2005 image classifications, whereas ground survey data and updated county spatial plans were used for the 2020 image classification. The number of sample points in 1990, 2005, and 2020 was 975, 1105, and 1544, respectively. The samples were split into training samples for training the RF classifier (70%) and verification samples for accuracy verification (30%).

Accuracy assessment

Accuracy assessment is an integral part of digital image processing, as it reveals the quality and reliability of the classified images. The accuracy of the LULC maps was assessed based on a confusion matrix using an independent validation set of ground-based data. Accuracy assessment metrics, such as producer accuracy, user accuracy, and overall accuracy, were used to evaluate the overall classification process (Congalton 1991). Producer accuracy indicates how the ground features are correctly shown on the classified map. In contrast, user accuracy reveals how often a class on the classified map is depicted on the ground surface. The overall accuracy provides the percentage of correctly classified pixels for all class types. The recommendations of Olofsson et al. (2014) were adopted for accuracy assessment. The method outlines good practices for area estimation and accuracy assessment of RS image classification. In addition, it provides guidelines for proper reference sample selection and precise allocation of different class strata to achieve the desired samples.

Change analysis of land cover maps

Change analysis is a post-classification procedure that detects and quantifies changes in independently produced LULC classifications for different dates. The method provides transitions between land covers, quantifies the land cover changes, and presents information on the distribution of changes in the landscape. In this study, the analysis of changes and their distributions was used to derive binary maps of areas that were converted to croplands in the two time periods. Subsequently, they were used to assess the potential drivers of cropland expansion.

Modelling cropland expansion

Logistic regression

LR is a machine learning regression technique that assesses the relationships between dependent variables (binary or continuous) and a set of independent variables (Peng et al. 2002). LR involves logit transformation of the dependent variable. The model has the following form:

$$\mathrm{Logit }\,Y=\mathrm{ ln}\left (\frac{\uppi (x)}{1-\uppi (x)}\right)=\alpha +\sum_{i=1}^{n}{\beta }_{i}{x}_{i}+\varepsilon$$
$$\uppi (x)=Y|X={x}_{1}\dots {x}_{n})=\frac{{\mathrm{e}}^{\mathrm{\alpha }+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+\dots {\beta }_{n}{x}_{n}}}{1+{\mathrm{e}}^{\mathrm{\alpha }+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+\dots {\beta }_{n}{x}_{n}}}$$

where π (x) is the probability of the outcome of interest, α is the y-intercept, β represents the regression coefficients, ε corresponds to the model error term, and x represents a set of explanatory variables. The antilog of Eq. 1 yields Eq. 2, which predicts the probability of the occurrence of the outcome of interest. The parameters α and β are estimated using the maximum likelihood (ML) method. LR is ideal for handling dichotomous outcomes and can be applied in instances of nonnormality of the dependent variable.

Cropland conversion maps were created and used to define the binary outcome. Converted areas were coded as 1, whereas nonconverted zones were coded as 0. The potential drivers of cropland conversion included proximity to rivers, population density, soil type, precipitation, soil organic carbon, time to the nearest urban centres, time to the nearest major road, elevation, soil pH, and slope (Fig. 2). The spatial structure of the drivers was assessed using a semivariogram and interpolation conducted in ArcGIS version 10.8.1. 5,000 random points were generated in the ArcGIS environment and used as representative samples to model the relationship between cropland expansion and the potential drivers. Spatial dependency effects were minimized by maintaining a minimum distance of 200 m between each sample pair.

Fig. 2
figure 2

Summary of the workflow integrating land-use changes and potential drivers of cropland expansion based on machine learning and evidence belief functions

The selected points formed the basis for extracting the explanatory variables from the corresponding interpolated surfaces. The data were converted into ASCII format for ready import in R software. Subsequently, LR was implemented using the generalised linear model function (R Core Team 2020). 80% of the samples were used for training the model, and 20% were used for model validation. First, a full model was fitted, followed by a multicollinearity assessment among the predictors using the variance inflation factor (VIF) function in the companion to applied regression (CAR) package in R (Fox et al. 2019). Stepwise regression using the backwards selection procedure was then used to select the statistically significant variables at a 5% significance level. The regression technique fits a full model and then iteratively drops predictors with less contribution to the outcome variable. The model with the lowest Akaike information criterion (AIC) was selected as the parsimonious model for generating probability surfaces of cropland expansion.

Boosted regression tree modelling

Boosted regression tree (BRT) modelling is an ensemble model that combines regression trees and boosting algorithms to generate nonparametric statistical models (Schapire 2003). In contrast to conventional statistical models, it fits multiple statistical modes to improve prediction accuracy. The rationale is that fitting multiple trees from several approximate rules and averaging them is easier than obtaining a single highly predictive model. The strengths of the BRT model include the potential for handling missing data, accommodation of different predictor variables, and robust modelling of nonlinear interactions between variables (Elith et al. 2008). The BRT model was implemented using the generalised boosted models package in R statistical software (Ridgeway 2005). The parameters specified for the model include a bagging fraction of 0.5, as recommended by (Elith et al. 2008), a tree complexity of 5, and a learning rate of 0.005. The bagging fraction determines the split between the training and validation data, tree complexity controls whether interactions are fitted, and the learning rate determines the contribution of each tree to the growing model. The sample points used for training and evaluating the LR were also used in the BRT modelling.

Evidential belief function

The evidence belief function (EBF) model is founded on the Dempster–Shafer theory of belief (Dempster 1968). It is a data-driven approach that computes mass functions of belief, disbelief, plausibility, and uncertainty using spatial occurrences of phenomena on the Earth's surface (Park 2011). The concept behind estimating these functions is that the locations of geographical phenomena caused by diverse earth processes can be utilized to determine the probabilities of confounding variables. Accordingly, the confounding factors are categorised into several class groups, which are then used to document the various EBF functions. The EBF model is ideally suited for assessing spatial integration processes such as LULC changes (Arasteh et al. 2019). Accordingly, an evidential map layer of the geographical phenomenon is required to compute the various functions.

Based on the functions, high belief values indicate a high likelihood of a factor contributing to an event within a class category, whereas high disbelief values indicate a lower chance. Therefore, computations of the belief and disbelief functions integrate the total number of unit cells or pixels within a class category, the number of unit cells of the evidential map layer within the class category, and the total number of unit cells in the exploration area. Equations 36 were used to compute the belief and disbelief values, where Fij represents i confounding factors (drivers) with j class categories. N (Fij) represents the total number of unit cells in class j, whereas N (Fij ∩ A) is the number of unit cells in class j that were converted to cropland. N (A) and N (T) indicate the total number of unit cells converted to cropland and the total number of unit cells in the exploration area, respectively.

$${\mathrm{Bel }}_{{F}_{ij}}=\frac{{W}_{{F}_{ij(\mathrm{Converted \, pixels})}}}{\sum_{j=1}^{n}{W}_{{F}_{ij} (\mathrm{Converted\, pixels})}}$$
$${\mathrm{W }}_{{F}_{ij} (\mathrm{Changed \, pixels})}= \frac{N\left({F}_{ij}\cap A\right)/N({F}_{ij})}{\left[N\left(A)-N({F}_{ij}\cap A\right)/[N\left(T\right)-N({F}_{ij})]\right]}$$

The numerator and the denominator in Eq. 4 correspond to the proportion of unit cells converted to croplands in each class factor and the ratio of unit cells converted to other land uses, respectively.

$${\mathrm{DIS }}_{{F}_{ij}}=\frac{{W}_{{F}_{ij(\mathrm{Non}-\mathrm{converted \, pixels})}}}{\sum_{j=1}^{n}{W}_{{F}_{ij} (\mathrm{Non}-\mathrm{converted \, pixels})}}$$
$${\mathrm{W }}_{{F}_{ij} (\mathrm{Unchanged \, pixels})}= \frac{N\left({F}_{ij})-N({F}_{ij}\cap A\right)/N({F}_{ij})}{\left[N\left(T\right)-N\left(A\right)-N\left({F}_{ij})+N({F}_{ij}\cap A\right)/[N\left(A\right)-N({F}_{ij})]\right]}$$

Evaluation of the BRT, LR, and EBF models

The models were evaluated using the area under the curve (AUC) of the receiver-operating characteristic (ROC) at separate classification thresholds. The ROC curve is a plot of sensitivity against specificity (Mas et al. 2013). Sensitivity gives the proportion of the positive class that was correctly classified, while specificity indicates the ratio of the negative class that was correctly classified. The AUC ranges from 0 to 1. A perfect model yields an ROC value of 1, which indicates an exact agreement between the predicted values and the observations. ROC was implemented using the PROC package in R (Robin et al. 2011).


Land use and land cover changes in Trans Nzoia County

The land cover maps for Trans Nzoia County reveal changing land use and land cover dynamics. Across the study area, the dominant land cover class in the studied epochs was cropland. The initial coverage of cropland in 1990 was 33% of the study area, and the coverage further increased to 66% and 72% in 2005 and 2020, respectively. The changes demonstrate a rapid expansion of croplands between 1990 and 2005, followed by a slow, albeit increasing, expansion until 2020. Over the period, the area under croplands grew at the cost of forestlands, wetlands, and grasslands. Spatially, the land cover distribution indicates that croplands occupied the central areas of the county, whereas forestland and grassland classes dominated the western and northeastern parts, respectively (Fig. 3).

Fig. 3
figure 3

LULC maps for a 1990, b 2005, and c 2020

Built-up areas recorded positive growth over the study period. The coverage in 1990 was 2.4 km2, which translates to approximately 0.1% of the total land area. During the period, few pockets of built-up zones were evident in the main town, located at the centre of the county. However, the coverage increased to approximately 10 km2 by 2005, with most expansions occurring in the surrounding areas of the main town. Additionally, some sections of the western and northern parts of the county experienced a notable increase in built-up coverage. The highest coverage of built-up areas was recorded in 2020, with an approximate area of 36 km2. In this recent period, built-up areas expanded exponentially and extended along the main transport corridors through the county. Although the built-up area class accounted for the smallest proportion of the total land area, the findings of this study show that it experienced the highest growth in the study period.

Forestland recorded a decline in the 30 years. In 1990, the area under forest cover was approximately 500 km2. However, the coverage was reduced to 447 km2 and 335 km2 in 2005 and 2020, respectively. The decline in forest cover was higher (− 25%) in the 2005–2020 period than in the 1990–2005 period (− 11%). The reduction was more pronounced in the forested areas of Mt. Elgon and Cherangany hills. The land cover maps further indicate that tree cover vegetation along river courses and creeks was drastically removed. Similarly, planted forest patches were gradually cleared. The effect is evident from the 2020 LULC map (Fig. 3c). Likewise, wetlands exhibited a continuous decline in the 30 years. The study area's wetlands comprise permanent streams, open water, seasonal and permanent marshes, riverine vegetation, scrub, forested wetlands, and seasonal flood plains. In 1990, wetlands occupied 270 km2. However, the coverage declined to 230 km2 and 73 km2 in 2005 and 2020, respectively. The decline was higher in the 2005–2020 period (− 68.2%) than in the 1990–2005 period (− 15%).

Grasslands covered 19% of the total area in 1990 and declined by 16% in 2005. In 2020, there was a slight increase of 1% in grassland cover. Grasslands dominated the western region of the county and the periphery of the Cherangany hills forest in 1990. The LULC dynamics show that extensive grassland areas were rapidly transformed into croplands in the 1990–2005 period. Approximately 72% of grassland cover was converted to croplands during this period (Fig. 3a,b).

Accuracy assessment

The accuracy assessment statistics of the LULC classifications are presented in Table 2. The overall accuracies for 1990, 2005, and 2020 are 82%, 93%, and 93%, respectively. The producer accuracies were above 70%, except for the built-up area in 2020, which was 53%. The low accuracy in the built-up area classification can be attributed to the mixed-pixel problem common in urban areas, as they rarely transition to other land covers.

Table 2 Overall, producer and user accuracies of the LULC classifications in 1990, 2005, and 2020

Change analysis of land cover maps

The Sankey plot (Fig. 4) characterizes the distribution of the major LULC transitions in the 30 years. A noticeable trend from the plot is that most of the LULC transitions were directed towards croplands, visible from the width of the links connecting each land cover node. From the plot, 44% of wetland conversion between 1990 and 2005 occurred at the expense of cropland expansion, and the expansion increased to 62% in the 2005–2020 period.

Fig. 4
figure 4

Sankey plot showing land cover transitions in the 1990–2005 and 2005–2020 epochs

The dynamics of the grassland cover show that 72% of grassland coverage converted to croplands between 1990 and 2005. However, the conversion declined to 62% between 2005 and 2020. On the same note, the forestland class lost 26% of its coverage to croplands between 1990 and 2005. otherlands category, comprising barren land, artificial surfaces, pockets of water features, and other unclassified features, lost 67.9% and 32.4% to cropland in the 1990–005 and 2005–2020 periods, respectively. Overall, croplands gained significantly from the major land covers in the study area.

Drivers of cropland expansion

Logistic regression

The logistic regression results provide the relative influences of the individual drivers on the cropland expansion in the study area. Multicollinearity assessment revealed low Pearson correlation values among the drivers except for access to major roads and market centres (0.98). Accessibility to major roads, therefore, was dropped from any further analysis. The retained drivers had low VIF values (< 2), thus confirming the absence of multicollinearity. The drivers were further analysed using stepwise regression to quantify their effect on cropland expansion. The aim was to obtain a parsimonious model with statistically significant drivers that explain cropland expansion (Table 3). The drivers' estimates were transformed into percentage odds to assess the relative contribution of each driver to cropland expansion.

Table 3 Logistic regression model estimates and percentage odds of cropland expansion drivers

The findings show that the contribution of elevation was positive and revealed a increase of  2.1 percent odds in cropland expansion for every 1 unit increase in elevation while controlling for other variables (Table 3). Similarly, the contribution of accessibility to market centres was also positive, and it showed that a unit increase from market centres increases the odds of cropland expansion by 0.7%. The result confirmed that croplands expand farther from market centres into rural setups, which is typical in Kenyan landscape environments. Regarding soil drivers, soil pH was the only variable that showed statistical significance and a large marginal effect among the drivers. Accordingly, a unit increase in soil pH increased the odds of cropland expansion by 25%. Accessibility to water sources depicted a negative trend where a unit increase in proximity to water sources decreased the odds of cropland expansion by 1.27%.

Boosted regression trees

Partial dependence plots (PDPs) summarised the results obtained from the BRT model (Fig. 5). The plots model each driver's relationship to cropland expansion while controlling for other factors. The probability of cropland expansion is indicated on the y-axis, whereas the data distribution is plotted on the x-axis. The assessed drivers demonstrated varied influence on cropland expansion based on the data range on the x-axis. For instance, areas with low slopes (< 10%) depicted a high likelihood of cropland expansion, with declined probability as the gradient increased. PDP for elevation revealed regions that range from 1750 to 2000 m to have a high likelihood of cropland expansion.

Fig. 5
figure 5

Partial dependence plots indicating marginal effects of the drivers on cropland expansion. The x-axis shows the data distribution, and the y-axis indicates the probability of cropland expansion

Soil properties indicated that regions characterised by low soil organic carbon (SOC) had high tendencies to be converted to croplands. However, areas with high SOC values were less likely to be converted to croplands. The possible explanation is that marginal areas are increasingly experiencing exploitation for agricultural production. Regarding soil pH, regions with values that range between 5.5 and 6.0 showed a high likelihood of expansion, which declined in the neutral and less acidic zones. Although the relative contributions of population density and proximity variables were low, the results showed a significant trend in their factor ranges. Notably, areas within 1-hour access of rivers indicated a high probability of cropland expansion. The plot showed a high likelihood of cropland expansion for population density in low-populated regions (0–900 people per square kilometre).

Evidence belief functions

The EBF functions of cropland expansions are presented in Table 4. The drivers showed high belief values, thus supporting strong evidence of cropland expansion. They comprise proximity to rivers, proximity to market centres, SOC, elevation, population density, soil pH, and precipitation.

Table 4 Belief, disbelief, and uncertainty values for different drivers of cropland expansion

The EBF model indicated that regions within 1 h of proximity to water sources have a high likelihood of cropland expansion, as revealed by a high belief value (0.342). Similarly, areas with a population density between 240 and 440 people per square kilometre showed high probabilities of cropland expansion (Bel = 0.31). Low SOC (Bel = 0.423) and high soil pH zones (Bel = 0.42) revealed a high likelihood of conversion. For access to market centres, areas within 2 hours of proximity showed a high probability of experiencing cropland expansion (Bel = 0.281). The results further indicated that low precipitation zones were more likely to be converted to croplands (Bel = 0.368). The high belief values in low SOC and low precipitation zones imply that cropland expansions also target marginal zones. Cropland expansions were also prevalent in elevation ranges between 2000 and 2800 m and between 2400 and 2800 m, as indicated by high belief values and frequency ratio scores.

Performance of the models

The model performances were evaluated based on the receiver operating characteristic (ROC) curves (Fig. 6). The AUC values ranged from 0.77 to 0.96, with the LR model showing excellent performance and the BRT model achieving the lowest accuracy. The AUC value obtained using the EBF model was moderate (0.86). Nonetheless, the obtained values showed good to excellent performances of cropland expansion assessment in Trans Nzoia County.

Fig. 6
figure 6

ROC curves and AUC values showing the accuracies of the LR, BRT, and EBF models

Probability of cropland expansion

Cropland expansion probability maps (Fig. 7) were generated to visualize the spatial patterns and to establish the drivers' contributions across the study area. The models agreed well in the characterization of the cropland expansion patterns. The predicted surfaces revealed that the western parts bordering Mt. Elgon forest and the Cherangany hills ecosystems are at a high risk of cropland expansion. The BRT and EBF models revealed more similar expansion patterns than the LR model. Nonetheless, all models showed a high probability of cropland expansion in the western parts, which declined towards the central regions and increased in the western region.

Fig. 7
figure 7

Surfaces showing the probability of cropland expansion based on a logistic regression, b evidence belief functions, and c boosted regression trees

The LR model showed a higher likelihood of cropland expansion in the western and eastern parts than in the southern region. Additionally, few areas in the north revealed a high probability of cropland expansion. The predictions showed a close fit between the different approaches, thus demonstrating the robustness of the assessed drivers in characterizing cropland expansion in the region. Blending different techniques provides an array of statistical measures that help to understand the magnitude, nature, and direction of cropland expansions. For instance, LR and BRT provided relative contributions of various factors, whereas BRT and EBF provided the factors' range of influence on cropland expansion.


LULC classification and accuracies

In this study, a detailed evaluation of long-term LULCCs (1990–2020) was conducted in Trans Nzoia County. Six dominant land covers were mapped to assess their spatial coverage in the 30 years. The overall classification accuracies were above 82%, with a high classification accuracy of 93% in the 2005 and 2020 classifications. Similarly, the user and producer accuracies strongly agreed between the mapped classes and the reference data. One exception was in the classification of built-up areas, where the producer accuracy was 53%. A possible explanation for this observation is that the classifier may have misinterpreted urban areas because of mixed pixels. The spatial coverage of a single Landsat pixel used in the classification process was 30 m. However, buildings and surfaces in the region have less coverage, resulting in mixed land use classes. The problem that always results in spectral confusion is dominant in characterising urban footprints (Forget et al. 2018).

LULC changes and cropland expansion

The LULCCs show that the study area experienced losses in forestland, grassland and wetland land covers. Conversely, built-up land and croplands increased in the same period, with gains from wetlands, forestlands, and grasslands. Similar cropland expansion trends and intra-class transitions have been observed in other studies conducted in western Kenya (Becker et al. 2016; Masayi et al. 2021; Rotich and Ojwang 2021). In the present study, the area under crops increased from 83,132 ha in 1990 to 166,420 ha in 2005 and then further increased to 180,222 ha in 2020. The observed expansion may be attributed to market forces, government extension initiatives, and population growth in the study region. The demographic assessment shows that in the 1990s, Kenya witnessed a paradigm shift in the agricultural sector, brought about by market liberalization and increased access to credit services (De Groote et al. 2006). These initiatives fostered rapid agricultural development, especially in high-potential regions such as Trans Nzoia County.

Population growth also contributed substantially to cropland expansion in the county. According to Kenya's population and housing census statistics, the population of Trans Nzoia County in 1989 was 393,682 people (GOK 1994). The population rose by 44% between 1989 and 1999 and increased to 818,000 by 2009 (GOK 2010). The observed growth may have induced a direct effect on cropland expansion due to the rising demand for land for settlement and food production purposes. KEFRI (2017) reported that human activities have increased in the area due to the rising population in Mt. Elgon and Cherangany forest hills and their borders. Our results also highlight massive losses of forested areas, wetlands, and grasslands bordering these towers. Population growth, therefore, played a primary role in the loss of forestland, wetlands, and grassland cover. Jayne and Muyanga (2012) also noted that Kenya's western region is densely populated.

The demographic dynamics in the region affected existing farming systems and management of land, encouraging unsustainable environmental practices such as deforestation, poor soil management, and encroachment of natural ecosystems (Allaway and Cox 1989). Ongugo et al. (2014) observed that politically motivated excisions in the Mt. Elgon forest ecosystem led to massive destruction of forests for settlement purposes. As a result, the beneficiaries took advantage of the scheme to clear extra land for agricultural use. Masayi et al. (2021) also noted intensive mixed farming in the Mt. Elgon forest ecosystem resulted in massive forest and biodiversity losses. The LULCCs dynamics and demographic influences corroborate previous studies conducted in western Kenya (Kogo et al. 2021; Mutoko et al. 2014).

Spatially, buffer zones and foothills of Mt. Elgon and Cherangany Hills forest ecosystems experienced rapid cropland expansion over the study period. Further expansions were prominent along the major river channels of the Nzoia and Sabwani Rivers. The finding agrees with the national environmental management authority (NEMA) district environmental action plan of Trans Nzoia county, which documented encroachments on protected areas by the locals to boost their household food production (GOK 2019). Similarly, Maua et al. (2022) and Ondiek et al. (2020) observed that the rich wetland resources of the Nzoia and Lake Victoria drainage systems experienced rampant anthropogenic exploitation within the period under our study. Studies in Kenya have also attributed cropland expansion to agricultural extensification practices (Eckert et al. 2017; Mwangi et al. 2018). At the SSA scale, cropland expansion in the late twentieth century resulted from rising economic activities, improved technology, declining soil fertility, and climate change variability (Jellason et al. 2021).

Cropland expansion and drivers

Cropland expansion in Trans Nzoia County was analysed using three modelling techniques, BRT, LR and EBF. The results revealed some consistencies in drivers’ influences and cropland expansion prediction. The three models revealed factors such as proximity to rivers, elevation, soil pH, and market accessibility as crucial factors of cropland expansion in the region. Furthermore, the BRT and EBF functions found that slope, SOC, and population density significantly affected cropland expansion (p < 0.05). On the other hand, LR associated elevation and proximity to rivers with cropland expansion. Regular physical and chemical analyses of water quality parameters in the Nzoia Basin confirmed high levels of phosphates and nitrates in water sources caused by intensive agricultural activities (KEFRI 2018; Twesigye et al. 2011). Likewise, Enanga et al. (2011) found human activities to be the fundamental causes of increased pollution in riparian buffer strips of hydrological watersheds in Kenya. The rising contamination can be attributed to intensive agricultural activities close to key water features.

The BRT model revealed a high likelihood of cropland expansions in areas of high soil acidity. According to Hijbeek et al. (2021), soils in western Kenya are predominantly acidic, which might have resulted in the more acidic samples used in the BRT model training. The EBF model uses class categories; thus, the observations might have captured the dynamics in the alkaline zones. In other studies in Kenya, for example, Were et al. (2014) found that soil pH was a key driver of LULCCs in the Eastern Mau forest reserve. Regarding population density, a high likelihood of expansion was reflected by the EBF model, whereas the BRT model revealed a medium influence. Accordingly, areas with a population density ranging from 240 to 440 people per square kilometre demonstrated a high likelihood of expansion based on the EBF model. One possible explanation is that areas with low population densities have more space to be utilized for agricultural use than highly populated areas.

Other studies within the Kenyan context pointed out some of the drivers identified in this study as significant to cropland expansion. For instance, Serneels and Lambin (2001) noted accessibility to markets and agroclimatic factors as critical drivers in the Narok district. In addition, Mwangi et al. (2020) reported population, proximity to rivers, and proximity to roads as crucial drivers of cropland conversions in Central Kenya. Moreover, Were et al. (2014) found soil pH, population density, precipitation, distance to towns, and rivers to be the significant drivers of LULCCs in the Eastern Mau forest reserve. A study in the Eastern Arc Mountains of Taita in coastal Kenya found strong associations between proximity variables and woodland-cropland conversions (Maeda 2011). Another study in the agro-pastoral regions of the Kajiado district found that changing preferences from herding to crop production and population density were the prominent drivers of cropland expansion in the Kajiado district (Campbell et al. 2005).

Comparable trends in cropland expansions are also noted in studies conducted in East Africa and the larger SSA region. Kindu et al. (2015) and Betru et al. (2019) found drivers of LULCCs in Ethiopia to fall within the broader categories of social, economic, environmental, policy, demographic, and technological forces. According to the researchers, some fundamental drivers included population density, livestock ranching, climate change, accessibility to markets, and accessibility to major road networks. In addition, Msoffe et al. (2011) found population and subsistence-driven agriculture to be the main drivers of cropland expansion in Northern Tanzania. A review of studies conducted in Uganda by Kilama Luwa et al. (2021) similarly noted population density as the primary driver of the observed LULCCs in the region. In Malawi, Li et al. (2021) and Munthali et al. (2019) found elevation, proximity to water sources, population, and human activities to be significant drivers of LULCCs.

Similar patterns of LULCCs and the associated drivers have also been found in studies conducted outside of Africa. For instance, Duraisamy et al. (2018) found access to water sources and improved road networks to be India's main drivers of LULCCs. In another study by Zaveri et al. (2020), dry rainfall anomalies contributed to cropland expansion in other developing regions outside the African context. Similar trends in cropland expansion have also been witnessed in developed countries. For example, a study in the United States of America shows that cropland expansions marginally targeted agricultural zones (Lark et al. 2020). In the European Union, however, a study by Kuemmerle et al. (2016) showed mixed findings with declines and hotspots of cropland expansions based on the region. There is a clear divide in the factors driving cropland dynamics between developing and developed areas. While the developed regions appear to have well-regulated policies that guide land use and management, less developed regions still face challenges in enforcing land management policies. Poor land tenure practices and systems for controlling land management are considered barriers to effective land management in these regions. Similarly, inequalities in land ownership promote illegal land-use activities that target land expansion (Keijiro and Frank 2014; Schürmann et al. 2020).

Overall, our study confirms that, similar to other global regions, drivers of cropland expansions in Trans Nzoia County, Kenya, are multifaceted, cutting across socioeconomic, climatic, and proximity factors. The modelling approaches used for assessment demonstrated an agreement on the fundamental aspects of cropland expansions and their spatial prediction. At the same time, it provides an understanding of the drivers of changes in a spatially explicit manner.


The present study investigated cropland expansion and its drivers, driven by LULCCs in Trans Nzoia County, Kenya. The study found that the county experienced rapid land use transformation in the past three decades, resulting in the loss of forests, wetlands, and grasslands. Consequently, cropland areas increased at the expense of these land cover types. The study noted the expansion to have resulted from an interplay of socioeconomic, climatic, and soil drivers. LR, BRTs, and EBFs models assessed the relative contribution of the drivers and generated spatial prediction surfaces for cropland expansion.

Based on the modelling results, we conclude that elevation, soil pH, and proximity to rivers are critical drivers of cropland expansion in the region. In particular, soil pH has a significant influence based on the relative contribution assessment from the LR and BRT models and the belief values from the EBF model. We also conclude that climatic, soil, and biophysical drivers influence cropland expansion across landscapes. Logistic regression showed a better performance in characterising the drivers of cropland expansion in the region. The application of the model and incorporating soil, proximity, and topographical variables is recommended for credible cropland expansion modelling in the future.

Therefore, the study provides insights into target areas for sustainable land management and conservation of the natural environment. Counties and national governments should integrate the drivers of LULCCs into their routine resource planning to foster harmony between food production and environmental protection. Such approaches are currently scarce, and the findings of this study create a solution pathway, especially in light of the rapidly growing population, urbanization, and increased human pressure on natural resources.