Introduction

Land-use change is recognised as one of the leading causes of biodiversity change, future extinction risk (Davison et al. 2021; Jaureguiberry et al. 2022) and climate change (Betts et al. 1997; Cox et al. 2000; Cramer et al. 2001; Daramola and Xu 2021; Boulton et al. 2022). Understanding land-use and land-cover change (LULCC) is therefore an important prerequisite for understanding environmental change and the interactions between human and natural drivers of ecosystem processes (Song et al. 2018). It is also a critical component of climate modelling, due to the greenhouse gas emissions and sequestration associated with different land uses, and the need to account for radiative climate forcing associated with land cover albedo changes (Brovkin et al. 2004; Pielke et al. 2011; Song et al. 2021). Processes set in train by LULCC may take decades or longer to play out and may be legacies of previous changes, prompting reconstructions of land use at centennial scales based on either the collation of archaeological and historical evidence (e.g. Widgren’s (2018) and Kay et al.’s (2019) maps of historical agricultural land use in Africa) or on modelling likely past land use, such as the History of the Global Environment database (HYDE) (Klein Goldewijk 2001; Klein Goldewijk et al. 2011, 2017) alongside other global models of historical land use such as SAGE (Centre for Sustainability and the Global Environment) (Ramankutty and Foley 1999; Ramankutty et al. 2008, 2018), KK10 (Kaplan et al. 2009, 2010) and ML08 (Millennium Land Cover Reconstruction) (Pongratz et al. 2008). Despite the importance of global land-use models within global climate models (e.g. for the World Climate Research Program 6th Coupled Model Intercomparison Project, CMIP6 (Hurtt et al. 2020)), few attempts have been made to verify these land-use models against detailed archaeological reconstructions.

The HYDE model, in particular, has gained popularity as an important resource for quantifying the effects of LULCC on past environments and climate (as discussed by Gaillard et al. 2010; Li et al. 2019; Zhang et al. 2021), and HYDE model outputs are frequently used in global climate and carbon-cycle models such as CMIP6 and the IPCC 6th synthesis report AR6 (Ellis et al. 2013; Hurtt et al. 2020; Lee et al. 2023). HYDE reconstructs historical LULCC at the global scale based on assumptions of population size, crop growth suitability metrics and climatic conditions to create maps of the patterns in land use (Klein Goldewijk et al. 2011, 2017). Although there has been an increase in global modelling studies (Ellis et al. 2021; Winkler et al. 2021), few studies have explored the uncertainties entrained in these models that would impact model results; with these uncertainties potentially due to data gaps, issues of spatial or temporal resolution, and with potential biases introduced if model development and assumptions based on some regions may not apply in geographically, temporally and culturally different contexts.

Some studies have highlighted the uncertainties that are general to global land-use models (Klein Goldewijk and Verburg 2013; Prestele et al. 2017), and several regional studies have explored the accuracy of HYDE’s land use reconstructions against detailed historical datasets (He et al. 2013; Li et al. 2019; Wu et al. 2020; Zhang et al. 2021; Zhao et al. 2022). One study focused on China found that the spatial patterns of cropland produced by the land-use model were broadly accurate at the global scale but were increasingly inaccurate in gridded cropland allocation at finer regional resolutions, while also overestimating annual growth rates in cropland area (He et al. 2013). Another study focused on Germany found that cropland allocation by HYDE did not match historical settlement evidence, missing areas of cropland expansion along rivers and wetlands (Zhang et al. 2021). It is therefore important to find out if similar or greater inconsistencies exist for Sub-Saharan Africa, where model extrapolations based predominantly on developmental patterns in other parts of the world require close examination.

LULCC reconstructions for Europe and Asia have been able to incorporate both qualitative and quantitative evidence (often at a national scale), whereas such detailed historical information on population and settlement patterns is not available for large parts of Africa. Indeed, the often-dramatic changes brought about through European colonialism in much of Africa during the nineteenth and twentieth centuries mean that extrapolation from modern land-use patterns and population data is extremely problematic. This study therefore aims to assess how effectively the global land-use models estimate historical land use in sub-Saharan Africa in comparison to archaeological reconstructions. We hypothesise that there will be systematic biases in the predictions of HYDE, relative to the archaeological and historical record. By looking at the variability in the distribution of agricultural land between HYDE and the archaeological reconstructions, we can refine our understanding of historical land-use and land-cover change and identify possible social and environmental predictors that influenced historical LULCC.

Study area and methods

The study area of this research covers the African continent south of the Sahara (including Madagascar, but excluding other islands). The climate system, based on the Köppen-Geiger classification, ranges from equatorial to warm temperate to tropical rainforest and savannah to arid steppe and desert (Beck et al. 2018; Nzabarinda et al. 2021). Vegetation distribution is influenced by rainfall and temperature variations across the region with the key biomes being African tropical forests, tropical savannah and bushland and Mediterranean-type forest-woodland-scrub (Adeleye et al. 2022). Rainfall patterns decrease away from the equator such that tropical rainforests receive year-round precipitation while other areas receive seasonal precipitation, declining to negligible quantities in the Sahara and Kalahari Deserts (Nzabarinda et al. 2021).

This study used the datasets on cropland for 1800 CE from HYDE v3.2.1 and Widgren’s (2018) map of agricultural systems by 1800 CE (Table 1). In the most general of terms, evidence of extensive agriculture can be found across Africa prior to the 1700s, for example, with the episodic spread of the Iron Age technologies from 1000 BCE to 1500 CE, correlating with the widespread adoption of agriculture and pastoralism (Kay and Kaplan 2015; Ramankutty et al. 2018). From the 1700s onwards, agricultural land use expanded gradually and then expanded rapidly again in the mid- to late nineteenth century, when increasing globalisation and European colonial expansion associated with the global Industrial Revolution led to rapid agricultural expansion in many areas (Lambin et al. 2003; Ramankutty et al. 2018). Thus, 1800 CE presents a particular time period of interest in this study as a point of transition in the evidence of pre-colonial agricultural land use and prior to the Industrial Revolution.

Table 1 Area of HYDE cropland and Widgren land-use zones in sub-Saharan Africa (including Madagascar) by 1800 CE (estimated Widgren areas represent the maximum possible area covered by Widgren’s land-use categorisations)

The HYDE 3.2.1 dataset consists of a series of spatially explicit maps of historical land use covering the period 10,000 BCE to 2015 CE at a resolution of 5 arc minutes (Klein Goldewijk et al. 2011, 2017). Historical land use is modelled by combining satellite information, specific allocation algorithms with time-dependent weighting maps and statistical information on populations, cropland and pastureland. HYDE’s statistics for cropland and pasture post-1960 are primarily obtained from FAO land-use statistics while pre-1960 values are estimated using per capita allocation of cropland and pasture. Land cover at 2000 CE is based on satellite data and used as a weighting map for cropland allocation (Klein Goldewijk et al. 2011). Population data since 1950 is based on the United Nations World Population Prospects data, while pre-1950 historical estimates are based on a combination of secondary sources on modelled world population history (Klein Goldewijk et al. 2017). Spatial patterns of population were allocated using weighting maps based on population density maps from LandScan for the year 2012 CE, and historic patterns were estimated using weighting maps with proxies such as soil suitability and distance to water (Klein Goldewijk et al. 2011), with soil suitability based on the Global Agro-Ecological Zones model (GAEZ v4) (Fischer et al. 2021, FAO and IIASA 2022). The resultant spatial maps generated show the distribution and area (in km2 per grid cell) of cropland, rice farming, pasture and rangeland from 10,000 BCE to 2015 CE (Klein Goldewijk et al. 2017).

Widgren (2018) developed a preliminary map of African agricultural systems by 1800 CE that charts the presence of agriculture and characterises the dominant agricultural system in sub-Saharan Africa during this specific time period. The dataset maps the distribution of different agricultural systems across sub-Saharan Africa by c. 1800 CE based on global categories of agricultural intensity in order to facilitate comparison between global regions (Widgren 2017). The categories range from pastoralism to extensive farming and permanent fields to intensive systems such as terracing and irrigation (Widgren 2018). Widgren’s map focuses on charting qualitatively different agricultural regions with the aim of visualising what is known of precolonial agricultural systems, rather than mapping land cover. The map thus synthesises archaeological investigations, records of (primarily European) traveller accounts, ethnographic and linguistic studies, oral histories and archaeobotanical evidence to map and characterise the different agricultural zones and the dominant agricultural systems in each zone.

Comparison of cropland area and distribution across Africa

The total area and gridded cropland distribution in HYDE 3.2.1 for the period 1800 CE was compared with the land-use zonation from Widgren for that same period. The HYDE cropland map aggregates all forms of agriculture into one broad category and focuses on distinguishing the percentage area of each grid cell that is dedicated to cropland, while Widgren (2018) categorises different forms of agricultural land use according to different levels of agricultural intensity. Spatial resolution and projection were first unified between the HYDE and Widgren maps to cell resolutions of 8885.95 m × 8885.95 m (this was the average cell resolution of the HYDE map to which Widgren was adjusted to) and map projections of Africa using the Lambert Conformal Conic projected coordinate system. While the Widgren map represented land use as polygons, we converted these polygons to a raster map with a cell resolution to match that of HYDE. This enabled us to obtain an estimate of the maximum possible acreage associated with each of Widgren’s categories of land use. Using the average grid cell size of 78.96 km2, comparisons in total area between HYDE and Widgren were carried out, with the cropland distribution of HYDE 3.2.1 clipped from the global cropland map to match the extent of the Widgren map (Table 1). Spatial analysis was conducted in ArcMap 10.8 and R v4.2.3 to compare the distribution of cropland in HYDE (converted to percentages) and the Widgren land-use zones.

The percentage of cropland per cell in HYDE ranged from 0 up to 60% of the grid cell for Africa. In Widgren’s map, land use ranged from regions dominated by pastoralism/ranching to mixed farming and permanent fields and to intensive farming with rice. Extensive farming, as defined by Widgren (2018), included slash-and-burn and shifting cultivation and agriculture of undefined character. It should be noted, then, that Widgren is merely characterising the type of agricultural production employed, and that an unspecified and unknown proportion of this land use would be under cultivation at any one time.

The cropland percentages per cell from HYDE were then analysed in relation to the Widgren land-use categorical levels of land-use intensity and cultivation. In addition, because the different Widgren land-use zones vary in geographic extent (Fig. 1), we removed land-use zones that covered very large spatial extents within which there might be expected to have been more localised areas of more and less intensive use (Fig. 2). The removal of the large polygons that covered great spatial extents was necessitated by the fact that there was very limited historical information about agriculture across these large areas. Hence, analyses were focused on the smaller areas for which stronger supporting archaeological data were available. The polygons chosen for the study were selected through a secondary review of the literature to validate that the land-use types in the study areas matched those identified by Widgren, and for which approximate spatial extents could be determined to be roughly approximate to those in Widgren’s maps. Reviewing the documentary evidence utilised in the development of the Widgren (2018) map, Widgren stresses that the archaeological and historical data allow the characterisation but not the quantification of different, more localised land uses within the region. To ensure consistency in analysis, the HYDE map was also further cropped to match the Widgren zones, and these two maps were used to conduct further spatial analyses on the relationship between land-use types and cropland percentages distributions as well as to compare spatial patterns with environmental predictors (see section “Comparison of spatial patterns with environmental predictors” below).

Fig. 1
figure 1

Map of HYDE cropland percentage (solid colour shading) compared with agricultural land use at 1800 CE from Widgren’s map (coloured outlines)

Fig. 2
figure 2

Map showing selected Widgren agricultural land-use zones and the HYDE cropland percentages in these same zones at 1800 CE

Comparison of spatial patterns with environmental predictors

In order to understand systematic differences between HYDE and archaeological and historic reconstructions, we fitted a series of regression models (i.e. generalised linear models GLM and cumulative link models CLM) to each dataset, comparing scores to key covariates using R v 4.2.3. To model the HYDE cropland as a function of the covariates, a GLM with a quasibinomial link function was employed, using the R MASS package. To model the Widgren land-use reconstructions as a function of the covariates, a CLM with logit link function and flexible thresholds was employed, using the R ordinal package. The Widgren agricultural intensities were treated as an ordinal ranking in the analyses. The key covariates were environmental predictors of slope and soil suitability condition, which were used to evaluate the extent to which the spatial patterns of HYDE cropland and Widgren land-use distribution related to major drivers of agricultural suitability. Two allocation assumptions were assessed, selected from the parameters used in HYDE historical cropland allocation (Klein Goldewijk et al. 2017): slope average steepness using the ETOPO1 1 arc-minute global relief model (Amante and Eakins 2009; NOAA_National_Geophysical_Data_Center 2009) and soil suitability based on the Global Agro-Ecological Zones model (GAEZ v4) (Fischer et al. 2021, FAO and IIASA 2022). In order to try and account for the differences between historical and modern farming practices, we employed the GAEZ soil suitability index for rainfed agriculture with low inputs, under the assumption that the GAEZ low inputs category refers to traditional agricultural management that does not employ the use of chemical fertilisers and other modern intensive inputs to farming. This incorporates consideration of the dynamic nature of soils, beyond simply soil quality at a given time. The “Global land-use model limitations at the regional and local scale” section (below) considers further the challenges in applying modern soil suitability assessments to historic farming practices.

Results

Patterns of cropland and land use across Africa

HYDE 3.2.1 for 1800 CE and Widgren’s 1800 CE maps are in good agreement that some level of agriculture was widespread across sub-Saharan Africa and Madagascar. HYDE shows higher levels of agricultural activity (10% or greater) over approximately 5.1% of the total land area in SSA, while Widgren’s more intensive agricultural land uses (sum of mixed, permanent, intensive and intensive with rice categories) constituted 5.8% of the total land area (Table 1, Fig. 1). However, as outlined in the “Comparison of cropland area and distribution across Africa” section above, the area in the Widgren land-use categories represents the maximum possible area given to agriculture, so the actual area of land cultivated at any given time could be far lower. Overall, the average cropland cover in HYDE was 2% (Table 1). In Widgren, the most common class of land use was extensive and undifferentiated farming (Table 1).

In contrast, we found limited agreement between the spatial distributions of croplands in Widgren’s map and that of HYDE (Fig. 1). Visual comparisons show that there is a large variation in HYDE cropland allocation within individual Widgren land-use classes: specifically, HYDE predicts zero cropland in some areas where Widgren’s archaeological reconstructions reveal that agriculture was present.

This variation in HYDE cropland allocation exists within each Widgren land-use type (Figs. 1, 2). However, since the Widgren land-use zones represent broad categories of land use within relatively large areas, it is possible that this level of extrapolation masks underlying similarities in the spatial distributions predicted by HYDE and by Widgren. To assess this variation, we focused on comparing the two maps in areas with the most reliable archaeological and historical data (Fig. 2).

For these locations (Fig. 2), we found evidence for a negative association between HYDE and Widgren estimates of cropland intensification (χ2 = 686.55, d.f. = 4, P < 0.001, Fig. 3, Online Resource 1-Table S1) where HYDE % cropland decreases with increasing Widgren land-use intensity (Online Resource 1-Table S1). In these archaeologically well-understood areas, HYDE reports zero cropland in a number of locations where Widgren reports large areas dedicated to agricultural land use, even in historical landscapes where Widgren identified intensive farming (Fig. 3).

Fig. 3
figure 3

a Histogram of the distribution (‘counts’ represent the numbers of grid cells falling into each HYDE cropland class) and b violin plot of the frequency and density distribution of the HYDE % cropland within the different Widgren land-use zones

Despite the statistically significant negative relationship between the two approaches to intensity, the overall appearance of the graphs is simply that there is almost no relationship between HYDE % cropland estimates per cell and Widgren land-use classes (Fig. 3b). Thus, the graphical representations and results of the generalised linear model indicate that there is a spatial mismatch between HYDE and Widgren maps in the locations of croplands (Fig. 3 and Online Resource 1-Table S1).

Effects of topography and soils on land-use allocations

We tested two key assumptions of the HYDE model on both the HYDE and Widgren reconstructions, given that the HYDE model uses both slope (flatter land preferred) and soil quality (more on better soils) as predictors of the distribution of croplands (Figs. 4, 6).

Fig. 4
figure 4

a Distribution of HYDE cropland percentages and b proportion of Widgren land-use zones along the slope gradient (in degrees, range: 0–13°). Values at the top of each stacked column indicate the total number of 78.96 km2 grid cells

Given the underlying HYDE assumptions, there is a strong negative association between the HYDE cropland % cover and slope (χ2 = 1776.96, d.f. = 1, P < 0.001, Fig. 4a, Online Resource 1-Table S1). In contrast, Widgren’s land-use distribution showed a positive relationship between agricultural intensity rank and slope (χ2 = 1436.00, d.f. = 1, P < 0.001, Fig. 4b, Online Resource 1-Table S2). Hence, HYDE croplands were more frequent on flatter areas while the opposite was true for Widgren agricultural intensity categories; at the extreme, HYDE allocates zero croplands to slopes > 10° (Fig. 4a), while Widgren’s map shows approximately 947 km2 of intensive, mixed and extensive farming at slopes > 10° (Fig. 4b), for example associated with archaeological evidence of agricultural terraces. However, Widgren does show intensive rice farming in flatter areas, mostly in areas with slopes of 3° or less.

The cropland distributions in HYDE and Widgren also differed in relation to the GAEZ soil suitability index for rainfed agriculture with low inputs (Fig. 5). HYDE allocates farmland primarily (but not exclusively) into high-quality soils ranging from moderate to high soil suitability (Fig. 6a), whereas Widgren’s archaeological reconstructions suggest that intensive farming occurred across a wide range of soil qualities from marginal to high soil suitability areas (Fig. 6).

Fig. 5
figure 5

Map of soil suitability gradient (GAEZ index of soil suitability with rainfall, low inputs class) across Africa showing the selected sites of study. 0 refers to open water and SI = 0 refers to areas where the soil suitability index indicates zero suitability

Fig. 6
figure 6

Stacked graphs showing a distribution of HYDE % cropland and b proportion of Widgren land-use zones in the different soil suitability classes, with the total number of 78.96 km2 grid cells of HYDE and Widgren that fall in the different soil suitability classes included

We also found strong evidence of a complex, but significant association between soil suitability and both HYDE cropland distribution (χ2 = 697.81, d.f. = 9, P < 0.001, Fig. 6, Online Resource 1-Table S1), and Widgren land-use intensity rank (χ2 = 3126.4, d.f. = 9, P < 0.001, Fig. 6, Online Resource 1-Table S2). In both HYDE and Widgren, large proportions of cropland were associated with areas of marginal to high soil suitability; HYDE cropland in these areas ranged from 1 to 20% cropland (Fig. 6a), while Widgren land-use classes showed greater proportions of mixed farming, permanent fields and intensive farming (Fig. 6b). In areas where soil suitability ranged from marginal to zero suitability, HYDE cropland allocation of < 1% cropland dominated while for Widgren, as the soil suitability decreased, extensive farming, permanent fields and intensive farming (with and without rice) predominated.

Discussion

We found no general agreement and in some instances negative associations, in the spatial distributions of historical cropland reconstructions for Africa between modelled (HYDE) projections and the archaeological and historical evidence. While the findings of this research do not give absolute areas of the historical cropland, the results show that HYDE is missing key locations for agriculture in Africa and highlight the ways in which the cropland allocation rules require further development. Areas that corresponded to increased agricultural intensification in Widgren’s reconstructions did not have corresponding increases in HYDE % cropland allocation, and Widgren evidenced agricultural land use in areas where HYDE indicated no cropland allocation (Fig. 3). Although we have not explicitly studied the pastoralist areas, our initial assessment (Fig. 1) shows that HYDE overestimated agriculture in areas where Widgren indicated pastoralism. However, the extent to which there may have been localised areas of temporary (sporadic or seasonal) cultivation requires further documentation. HYDE underestimated cropland in areas of steep slopes > 10° as compared to Widgren. The archaeological and historical reconstructions also highlight more diverse land use and more intensive cropland distribution in areas where soil suitability is considered marginal or not suitable by the Global Agro-Ecological Zones model (Fischer et al. 2021, FAO and IIASA 2022).

Global land-use model limitations at the regional and local scale

The HYDE global land-use model is fairly good at reconstructing land use at the global scale but misses out on important regional variations. HYDE fails to capture substantial areas of agricultural land use in mountainous and sloped areas, which in Africa is where many communities have historically practised farming (Fig. 4). Equally, it exaggerates the areas of croplands in flat landscapes. Some of the mismatches between HYDE and Widgren likely arise from differences in agricultural practices in space (e.g. land use patterns in Europe vs Africa) and time (e.g. in Africa following the colonial period) which cannot be extrapolated reliably to a particular historical context (here, in Africa prior to the colonial period). African communities in 1800 CE made greater use of these steeper landscapes due to a variety of factors that might not be applicable in other geographical or historical contexts. For example, greater water availability in mountainous regions and highland water catchments, security (defensive positions) of living on hillsides, avoidance of extreme temperatures in the highlands and reduced threat of disease vectors all likely contributed to the 1800 CE association of human populations and croplands in Africa with relatively steep slopes (Widgren and Sutton 2004, Stump 2013). Many of these factors are not factors that would have influenced settlement and agriculture in Europe and so the modelling assumptions based on Eurocentric agricultural practices would miss this. However, these details need to be carefully incorporated into the HYDE models; while Widgren correctly identifies the gaps in HYDE cropland allocation on slopes, simply relaxing HYDE’s rules to include steeper terrain could result in overestimation or improper distribution of cropland to areas where agriculture has not been identified by Widgren maps, even when rainfall, slope, aspect and other factors are taken into consideration. The generalisations provided by global models such as HYDE are useful for assessing past land use but represent compromises between simplicity and detail based on the available data. Reconstructions such as those by Widgren provide an avenue to incorporate regionally and locally relevant archaeological evidence into historic land-use models. Archaeological reconstructions can pick up areas where historical population data is limited in the HYDE model. It is important to recognise that the Widgren map and HYDE model outputs differ in their spatial resolutions, particularly in regions where archaeological and historical data are sparse and the Widgren polygons are necessarily large. Hence, the research presented here focussed on the smaller polygons. For these regions, we can conclude that HYDE’s allocation rules miss key agricultural areas that were identified by Widgren’s historical and archaeological evidence. HYDE’s use of modern settlement data when hindcasting raises the issue of historical contingency that is linked to the embedded assumptions of HYDE that areas of modern high population density are legacies of past settlement. Post-colonial population geographies in Africa are not necessarily strongly correlated with historical population concentrations: a recognition that applies elsewhere in the world such as the collapse of pre-Columbian populations in the Americas in the 1500 s. The need to incorporate more regionally focused population datasets has also been seen in China and Germany (Wu et al. 2020; Zhang et al. 2021). Limited population data and the reliance on model assumptions that tie population and labour to land use have led to an overestimation of uncultivated land in the African context (Austin 2008). The results here help identify ways in which HYDE’s generalised allocation rules could be improved at the regional scale and point to ways in which we can incorporate historical and archaeological evidence into these models to refine historical LULCC reconstructions (see section “Inclusion of regional expertise necessary in model design” below).

Widgren’s historical and archaeological reconstructions highlight more diverse land use and more intensive cropland distribution in areas where soil suitability is considered marginal or not suitable based on the GAEZ soil suitability index, while HYDE tends to underestimate cultivation in marginal lands. Historic farming practices would have altered soil suitability through a number of processes, such as terracing, irrigation, the importation of nutrients through livestock dung and other cultural practices to improve soil moisture retention and nutrient levels (Widgren and Sutton 2004, Stump 2010, 2013; Kay and Kaplan 2015; Kay et al. 2019). The empirical evidence for substantial agricultural activities in areas today regarded as having low soil suitability (see Fig. 6) suggests that communities likely invested energy into providing these additional inputs. These human investments illustrate that agricultural soil quality depends on farming practices, and hence, quality can change through time (both increasing and decreasing). While efforts were taken to factor in the historical context of soil quality by employing the GAEZ soil suitability index for rainfed agriculture with low inputs (see section “Comparison of spatial patterns with environmental predictors” above), soil quality is not simply an inherent value but depends on environmental and socioeconomic contexts that change with time and across space. Hence, the GAEZ map, which is itself based on modern datasets, may not be a particularly strong predictor of historic soil quality, given that factors such as rainfall and land degradation as well as the introduction of mechanisation will have changed since 1800 CE.

Archaeological reconstructions present an important additional resource to help improve global reconstructions of land use and land cover and not as a means of completely replacing these models. Archaeological reconstructions such as Widgren (2018) and Kay and colleagues (Kay and Kaplan 2015; Kay et al. 2019) provide a means by which to refine our understanding of the geographical distribution and extent of historical human settlements and agriculture that can be used as proxies for population datasets. Ongoing research such as the LandCover6K Project (Morrison et al. 2021) also aims to improve the accuracy of reconstructed land cover datasets using historical archives, archaeological evidence and palaeo-ecological data (Harrison et al. 2020; Wu et al. 2020). The historical context of LULCC in Africa prior to the satellite/remote sensing era is still limited, particularly for agriculture; however, archaeobotanical research sheds some light on the earlier and more extensive cultivation and domestication of indigenous cereals that has occurred (Fuller and Hildebrand 2013; Stephens et al. 2019). However, the patchy distribution of sites across the landscape and limited datasets for given sites and the uncertainties brought about by ethnographic, historic and linguistic evidence make it difficult to pin down precise locations for land uses and transitions in LULCC are not as clearly defined. In addition, the archaeological reconstructions are also limited by data availability and distribution which means that they do not cover broad geographic regions effectively or provide a wide range of high-resolution datasets across different countries, hence the high levels of extrapolation between known areas of historic land use as acknowledged by Widgren (2018). Thus, it cannot be said that one form of historical reconstruction is more accurate than the other when trying to assess land use at the regional scale. Instead, the archaeological evidence presents a valuable additional data source that can be incorporated into these global land-use models to address gaps in reconstructing the extent and distribution of historical land use. Further research into ground-truthing the archaeological data would thus support the incorporation of these empirical data into the models by providing clearer information on the extent and distribution of historical land use.

Inclusion of regional expertise necessary in model design

Some of these oversights in global land-use models could have been avoided by the inclusion of wider and more diverse research perspectives from different regions that could have provided a more nuanced understanding of cultural decision-making involved in agricultural systems. The understanding of the regional and cultural differences in agricultural land use can then support land use and biodiversity modelling (see section “Implications for historical land-use and biodiversity modelling” below) where we can explore the processes involved in the development of regions of high biodiversity in agricultural landscapes such as niche construction practices that supported soil and water conservation (Ekblom et al. 2019). This is not to say that greater detail would necessarily mean more precision, but that incorporating regional perspectives in the weighting maps for cropland allocation might allow these global land-use models to better represent past land use. As discussed in the “Global land-use model limitations at the regional and local scale” section above, simply relaxing the generalised allocation rules for the global models presents its own challenges in model design; the development of regional allocation rules that take into consideration the cultural contexts in decision making in agriculture provides an avenue to refine these models. However, these regional allocation rules tend to require increased data processing and computational effort to employ; as such, designing smaller contained regional sub-models for specific regional studies might present a way forward.

In addition to regional perspectives, the development of high-resolution sub-models within the global models, in a similar manner to global climate models which downscale to regional models, would provide an avenue by which to improve global land-use models (Ridding et al. 2020; Verhagen et al. 2021). The development of regional models would facilitate the expansion of this research to other regions such as South America where regional archaeological reconstructions can be combined with expertise on local cultural practices that differ from those derived from Eurocentric assumptions to improve regional and global land-use modelling.

Implications for historical land-use and biodiversity modelling

The findings of this research have broad implications for understanding the effects of historical land use on patterns of LULCC and biodiversity distribution and for modelling these interactions. LULCC are some of the key drivers of biodiversity change (Haines-Young 2009), and integral to understanding biodiversity in the Anthropocene is having a clear understanding of the past land-use changes that impact present biodiversity (Thomas 2020; Dornelas et al. 2023). Agricultural land use can have unintended ecological effects whose legacies can be felt in the present (Ekblom et al. 2019; Marston 2021). Gaps in our knowledge of cropland distribution have significant impacts on models of biodiversity change such as the PREDICTS model (Hudson et al. 2014), which assesses how biodiversity changes under different scenarios of land use. Future research can aim to incorporate regional archaeological evidence into these global earth systems models in order to create models that project the impacts of historical land-use change on past and present biodiversity in order to understand the processes and legacies of these past land uses. By incorporating archaeological reconstructions into the global land-use models, we can enhance the modelling of historical land use and biodiversity interactions where the under/overestimation of past agricultural land use can impact our understanding of biodiversity changes in the past and into the present.

Conclusion

Global land-use models such as HYDE have significant gaps in their representation of historical land use at the regional level in sub-Saharan Africa that might also be replicated for other regions of the world. The HYDE model misses key agricultural areas which stem from the crop allocation assumptions. However, it should be noted that differences in spatial distributions between the HYDE model and the Widgren historical and archaeological data (as discussed in section “Global land-use model limitations at the regional and local scale” above) could impact the precision of these comparisons. Recognising these gaps in HYDE has broader implications for other model projections that use historical land-use models to understand present and future trends in land use, biodiversity change and climate modelling. There are a number of possible reasons for these gaps, but in broad terms, this appears to result from spatial and temporal differences in agricultural practices which cannot be extrapolated reliably to a particular historical context within the HYDE model given the model prioritises locations of highest modern population density, topography and soil quality that differ from local cultural practices.

While there are no simple model design solutions to the inconsistencies identified in HYDE, the archaeological and historical evidence can be used to identify where the generalised allocation rules are incorrect as well as point to ways in which to incorporate these empirical archaeological data to refine the allocation rules wherever possible. Historical land-use models therefore need to include regional expertise to ensure cultural and localised practices are effectively incorporated. It is thus recommended that global models such as HYDE aim to incorporate local expertise, refine model design with archaeological data and reconstructions and develop regional high-resolution sub-models in order to improve LULCC projections and model functionality for climate and biodiversity modelling. In addition, it is imperative that further testing is conducted on whether these spatial and temporal differences exist outside Europe, perhaps starting with areas with extensive archaeological evidence such as in South and Central America. One such solution could involve the development of regional archaeological datasets on the extent and distribution of historical agriculture based on large-scale reviews and ground-truthing of archaeological and historical evidence of past land uses. These archaeological and historical datasets can then be incorporated into HYDE and other global land-use models as proxies for historical settlements and land use. In this way, global models would be able to incorporate regional variations that take into consideration cultural contexts while limiting the need to overhaul the allocation rules completely.