Introduction

Tropical wetlands provide important ecosystem services, including global regulating roles in biogeochemical cycles and biosphere-atmosphere interactions. They store large amounts of carbon, are a sink for atmospheric CO2, and make up around 60-80% of the natural atmospheric methane (CH4) source (Chmura et al. 2003; Köchy et al. 2015; Rice et al. 2016; Saunois et al. 2020; Whiting and Chanton 2001; Zhu et al. 2017). In addition to being part of natural hydrology systems and to ensuring water quality and regulating its flow (IPBES 2019), they are also key reservoirs of biodiversity, they support fisheries, they are important sources of food, filter and retain sediments, and are culturally and aesthetically important to many communities and cultures (Mitsch et al. 2015; Shiel et al. 2016). One study estimated that 25% of the global value of ecosystem services is provided by wetlands (Costanza et al. 2014).

Wetlands soils, under regular anaerobic conditions, reduce the decomposition of organic plant materials, resulting in the accumulation of organic matter. This enables wetlands to effectively store substantial amounts of soil organic carbon (SOC) (Nahlik and Fennessy 2016). Despite wetlands only occupying about 5-8% of the earth's surface, (Mitsch and Gosselink 2007), point out that they store more SOC than all types of vegetation on earth. Wetlands are believed to store between 20-30% of the world's soil organic carbon (SOC), and in some cases, the SOC concentration can exceed 40% (Vepraskas and Craft 2016). This characteristic makes wetlands a key ecosystem for regulating water, climate, and biodiversity.

In this order, the development of accurate maps of wetlands and soil organic carbon (SOC) is crucial, as it enables an adequate inventory and monitoring of these carbon-rich ecosystems, leading to better quantifications of SOC stocks and greenhouse gas (GHG) emissions (Page et al. 2011). As stated by Minasny et al. (2020), a better and rapid estimation of peatlands distribution allows stakeholders to make informed decisions regarding the management, conservation, and restoration of these ecosystems. Nonetheless, the extent and state of tropical wetlands and the quantification of soil organic carbon stocks are still uncertain (Xu et al. 2018). Köchy et al. (2015) estimated that they cover around 9% of the tropical land area and contain around 40 Pg of soil organic carbon (SOC), mostly in floodplains and marshes. In addition to uncertainty about the extent of tropical wetlands, the carbon density or carbon stock per unit area is also uncertain. Assessments of SOC storage vary widely due to both definition and methodological differences, from about 120 Gt to 535 Pg of SOC (Mitra et al. 2005). This is an important constraint in earth systems modelling, where lack of data on the carbon balance and gas fluxes in tropical wetlands limits accurate predictions about future climate and the importance of wetlands in climate change (Melton et al. 2013; Sjögersten et al. 2014).

South America has the largest extent of tropical wetlands, most of them found in floodplains where they are subject to seasonal fluctuations in water levels (Gumbricht et al. 2017; Junk et al. 2013). The continent has large extents of seasonally flooded savannas, including the Pantanal, Bananal Island, and the Lavrado in Brazil; the Iberá in Argentina, the western Guianan flooded savannas in, Guiana, Venezuela, and Suriname; the Llanos de Moxos in Bolivia; and the Llanos del Orinoco (Barbosa et al. 2012; Hamilton et al. 2002), which is the subject of this study. The Llanos del Orinoco cover about 38.31 Mha (Barreto and Armenteras 2020) in the Orinoco River Basin of Colombia and Venezuela. The Llanos del Orinoco landscape is a mosaic of ecosystems that include forests, extensive areas of Moriche (Mauritia flexuosa) palms, upland savannas, and flooded savannas. The flooded area in this region shows significant interannual variation. Hamilton et al. (2004) found that the long-term mean inundation area was about 3.47 Mha, with a median area of 2.53 Mha, which makes it the second largest flooded savanna wetland in South America after the Brazilian Pantanal. For this study we focused on the flooded savannas of the Casanare in Colombia, which drains into the Meta River, one of the major tributaries of the Orinoco River that originates in the Andes. Few data exist for this region, but the area is undergoing rapid change with the introduction of exotic pasture grasses and extensive areas of monoculture cropping. Additionally, fire has been used in this landscape for centuries, although its duration and intensity in the flooded savannas typically is low due to the presence of rocky outcrops, sand dunes, and low biomass on sandy patches, which are spread throughout the landscape (Lasso et al. 2010).

SOC accumulation in flooded savannas appears to be related to length of flooding; short-term measurements in the flooded Colombian Llanos del Orinoco suggest stable to slightly accumulating SOC stocks (Vega et al. 2014). The objectives of the present study were to map the spatial distribution of SOC content and stocks throughout two digital soil mapping (DSM) approaches and determine the factors that influence their variability. To do this, we focused on the eastern Casanare Department in Colombia, where the largest extents of flooded savannas in the Orinoco Basin are located (Gumbricht et al. 2017; Hamilton et al. 2004).

Material and Methods

Study Area

The study area (Fig. 1), known as the “Sabanas Inundables de Casanare” (Casanare flooded savannas) is part of an extensive floodplain ecosystem located in the “Llanos” ecoregion in the Orinoco River Basin (Olson et al. 2001). With a total area of 664,752 ha, the study area is located between the latitudes 6.30°N and 5.52°N and longitudes 71.07°W and 69.84°W. It has an elevation ranging from 78 to 146 m.a.s.l. and an average slope of 1.55 degrees. The annual average temperature in the region is approximately 26°C, with an average maximum temperature of 33°C during the dry season and an average minimum of 21°C during the wet season. Rainfall distribution follows a unimodal regime with total annual average precipitation of 2,684 mm, characterized by a rainy season from April to November, during which most of the area is flooded, and a short dry season from December to March when the water drains and many of the flooded wetlands dry out (Buriticá Mejia 2016; Castillo-Figueroa et al. 2019; Lasso et al. 2010).

Fig. 1
figure 1

Study location in the Casanare Flooded Savannas, Colombia. a) location within the tropical savannas ecoregion “Llanos” (Olson et al. 2001) b) position within the Casanare department (dark red line) and c) local land cover map of the study area

The land cover is mainly composed of native savanna, native forest, and permanent water bodies. Native savanna is sub-classified in this study as seasonal savanna, hyperseasonal savanna, and semiseasonal savanna, based on its vegetation and soil-water conditions (Cabrera-Amaya et al. 2020; Mora-Fernández and Peñuela-Recio 2013; Peñuela et al. 2014; Romero Duque et al. 2018; Sarmiento 1983). Seasonal savanna is a tree savanna, characterized by discontinuous grasses, dicotyledons and herbaceous vegetation, with the presence of scattered trees and shrubs. It develops under a bi-seasonal regime marked by a dry season (4 to 6 months) and a wet season, in which it remains flood free. In the dry season, the vegetation acquires a wilted appearance, and large numbers of plants die off, increasing fire frequency. This ecosystem is located mainly on plain-convex banks, continental dunes and sandy deposits, where coarse-textured soils, compaction, and low water availability create difficult conditions for vegetation growth. Hyperseasonal savanna is the most widely spread ecosystem in the region. It is mostly a treeless savanna, where perennial grasses and annual dicotyledons predominate and sporadic woody plants, shrubs, and palms are found. This ecosystem is temporarily flooded during the rainy season, where water table level can reach up to 20 cm above the surface, but it dries out quickly after the rain ends. It experiences four seasons: drought, onset of soil saturation, flooding, and decline in soil saturation. Soil texture is mostly represented by sandy-loam and clay-silty loam. Semiseasonal savanna is a treeless savanna, with very rare presence of woody plants or palms. Vegetation (grasses, sedges, march and aquatic plants) varies in function of flooding duration and water table level, which could reach up to 1 m above the surface. This ecosystem is found in topographic depressions with poorly drained conditions, and clay soils. Semiseasonal savannas experience long periods of flooding (between 8 and 11 months) and never present water deficit, even during the dry season, serving as an important food and water source in the region. Similarly, the native forest includes Moriche palm and gallery forests. Moriche palm corresponds to plant communities that are dominated by Mauritia flexuosa palms and numerous herbaceous and shrubby species. They are present in flooded areas or in soils with medium-fine textures that do not present water deficits during the dry season. Gallery forest, in turn, is dominated by a great variety of woody trees, palms, and shrubs and are developed close to permanent water bodies, such as riverbank levees, alluvial, and overflow plains of rivers, channels, and ravines.

For mapping purposes, a local land cover map derived from remote sensing images (Fig. 1) was created through a supervised image classification algorithm (further information in “Environmental Covariates” section). After classification, five categories were defined: seasonal savanna, hyperseasonal savanna, semiseasonal savanna, forest, and water. Gallery forest and Moriche palm were combined within a unique forest category since their spectral signatures were not clearly differentiated.

Soils of Casanare flooded savannas are predominantly Inceptisols and Entisols formed from alluvial, fluvio-torrential, and fluvio-glacial sediments deposited from the upper Pleistocene to the lower Holocene, upon which Aeolian materials subsequently accumulated (IGAC 2014). Relief is mainly composed of large extensions of aeolian and alluvial terraces, dissected by depressions and channels covered by fine materials (floodplains), which remain flooded for a great part of the year, and some elevated sandy areas called “Medanos” or continental dunes, which have been formed longitudinally by the northeasterly trade winds (IGAC 2014). The combination of the regional geomorphology and the high annual precipitation creates the hydrological connectivity between rivers and wetlands, resulting in the maintenance of water throughout the region for most of the year (Sarmiento 1984).

Data Collection and Soil Characterization

A total of 80 sites were sampled in February 2019 at two soil depth intervals (0-10 cm and 10-30 cm), resulting in 160 soil samples (disturbed and non-disturbed). The sites were sampled following two strategies: 43 sites located in four transects along the peat depth distribution predicted by the Global Wetlands Map (Gumbricht et al. 2017) and 37 sites distributed across the study area based on the conditioned Latin hypercube sampling (cLHS) strategy (Fig. 2). The cLHS method is a stratified random procedure for sampling variables from a multidimensional distribution of covariates, which defines sampling locations that capture the greatest variability among covariates (Minasny and McBratney 2006). Maps of covariates used in the cLHS to define the sampling site locations included: the Orinoquía land cover map at 30 m resolution ; geology and lithology at a scale of 1:100.000 (IGAC 2014); peat depth and wetland predictions with a spatial resolution of 123 m (Gumbricht et al. 2017); landforms (Jasiewicz and Stepinski 2013) with a spatial resolution of 12 m (derived from the DEM); and the cost distance, which represents the cost of reaching each point in the landscape from the road network based on topography (slope at 12 m resolution) and distance from roads (Aitken 1977; Roudier et al. 2012). The Orinoquía land cover map of Colombia was developed from Landsat 5-8 images at a resolution of 30 m, the landforms were mapped using the DEM ALOS/PALSAR scaled up at a resolution of 12 m based on the geomorphons approach (Jasiewicz and Stepinski 2013), and cost distance was mapped according to Douglas, (1994). Due to the adverse field conditions, some sampling sites needed to be readjusted to locations with similar conditions, keeping original sampling characteristics (Fig. 2).

Fig. 2
figure 2

Spatial distribution of the original cLHS sampling sites (purple points), field sampled sites (blue points) and peat depth (m) distribution predicted by Gumbricht, et al. (2017)

The soil samples were air-dried, sieved, and analyzed at the International Center for Tropical Agriculture (CIAT) laboratory for determining bulk density (BD) and soil texture (clay, sand, silt). BD and soil texture were performed by the volumetric ring (Blake 1965) and hydrometer (Bouyoucos 1936) methods, respectively. SOC, stable 13C isotope (δ13C), total nitrogen (Nt), and stable 15N isotope (δ15N) analyses were performed using the dry combustion method and mass spectrometry at the UC Davis Laboratory. Carbon to nitrogen ratios (C:N ratios) were calculated from SOC and Nt.

Other characteristics such as land cover (seasonal savanna, hyperseasonal savanna, semiseasonal savanna, gallery forest and Moriche palm); geomorphic classification (rise: slightly elevated areas (1-3% slope), talf: very low slope gradients (0-1%), dip: depressions, and continental dunes: aeolian mounds of sand (> 3% slope)) defined according to Haskins et al. (1999) and Soil Science Division Staff (2017); and generalized soil texture (sandy – coarser textures: sand, loamy sand, sandy loam; loamy – medium textures: loam, silt loam, silt, sandy clay loam, clay loam, silty clay loam; and clayey – finer textures: sandy clay, silty clay, clay) simplified from the soil texture taxonomy classification (Soil Science Division Staff 2017) at each sampling site were reported.

Statistical Analysis

Descriptive statistics of chemical and physical soil properties (0-10 cm and 10-30 cm) were done using SAS Studio. An ANOVA on log-transformed data, followed by the Student-Newman-Keuls test was carried out for normally distributed data, while the Kruskal-Wallis non-parametric test followed by a pair-wise separation using the Dwass, Steel, Critchlow-Flinger method was applied for data that could not be normalized.

SOC content at each depth was analyzed according to different environmental conditions: land cover, geomorphic feature, and generalized soil texture. Statistical analyses considered were data normality, homogeneity of variances, and design balance analysis tested using Shapiro-Wilk (“stats” R package), Levene (“car” R package), and the balance test (“nlme” R-package) functions. A Kruskal-Wallis non-parametric test followed by the Dunn post-hoc test was performed to identify significant differences among SOC comparisons. In both cases, significant tests were carried out using the “PMCMRplus” R-package including a Bonferroni p-adjustment method.

Environmental Covariates

To map the distribution of SOC content at 0-10 cm and 10-30 cm using digital soil mapping (DSM) (McBratney et al. 2003) a set of 32 environmental covariates was considered (see Table S1 in the supplementary material). Terrain attributes were derived from the DEM – ALOS/PALSAR scaled up at 12 m spatial resolution (Shimada et al. 2014). Geomorphology maps at a scale of 1:100.000 were obtained from the geoportal of the Instituto Geográfico Agustin Codazzi (IGAC). Because of lack of soil texture differences at the two depths, soil texture was generated by using the DSM approach random forest at depth of 0-30 cm. Vegetation indices (VI) were calculated in Google Earth Engine from SENTINEL-2 spectral images for 3 distinct periods (dry: December (2018) – February (2019), wet: May (2018) – October (2018), transition dry-wet: November (2018) – December (2018)). Climate data (temperature and precipitation from 2010 to 2018) was obtained from CHELSA and MODIS spectral images. The local land cover (Fig. 1) map was created through a supervised classification method combining field data and SWIR-1 (B11), near-infrared (B8), and blue (B2) bands from Sentinel-2 images.

Digital Soil Mapping

DSM combines different environmental covariates according to the soil-forming factors defined by Jenny (1941) with field soil point data in a mathematical model to produce high-resolution soil maps (raster/pixel-based) that are continuum and variable in the space (McBratney et al. 2003; Moore et al. 1993; Zhu et al. 2001). Two DSM approaches, Expert Knowledge (EK) and Random Forest (RF), were used to predict and map SOC content distribution at 0-10 cm and 10-30 cm of soil depth. Both models were calibrated using a random set of 70% of the field data (training dataset; same points for each model) and the other 30% of the field data (validation dataset) were used for model assessment.

The EK model was based on the Soil Inference Engine (SIE) system described by Shi et al. (2004). This model combines expert knowledge of soil scientists with fuzzy logic by creating instance rules based on the spatial correlation between soil properties and environmental covariates (Ashtekar and Owens 2013; Heitkamp et al. 2020; Zhu 1997; Zhu et al. 2001). To perform the model, four units were defined according to the most dominant local land covers (Fig. 1) and their respective instance rules were set to create a Membership Function (MF) map (Zhu 1999, 1997) using the training dataset. The MFs were created using the FRBS and sets R-packages in which two-piece normal and sigmoid functions were identified as more frequent descriptors of the soil-landscape relationship within every unit. After individual environmental covariate’s fuzzification was done, a weighted average was computed between the MF within every instance in order to get the overall optimality value for the unit.

The covariates used for mapping SOC with EK were chosen according to the SOC-landscape relationship observed during the sampling campaign, prior knowledge from the literature, information collected during the field trip, and the distribution of soil samples data across the environmental covariates. Scatter plots were created for each covariate using SOC content values as dependent variable to understand and create SOC-landscape relationships. Categorical covariates which did not provide clear differentiation by grouping many sites into each category were discarded.

The MF maps were used to map the continuous SOC content from the weighted average of the typical property values sampled in the field for each soil unit according to the following equation (Zhu 1997):

$${V}_{ij}=\frac{\sum_{k=1}^{n}{{S}_{ij}}^{k}{v}^{k}}{\sum_{k=1}^{n}{{S}_{ij}}^{k}}$$
(1)

Where: \({V}_{ij}\) is the estimated potential value at location \(i,j\); \({{S}_{ij}}^{k}\) are the fuzzy membership values at each location (similarity values); \({v}^{k}\) is a typical value of soil type k.

Random Forest (RF) is a machine learning method used to predict continuous and categorical variables from a large collection of non-correlated decision trees (Breiman 2001). The construction of each tree depends on the classification of the environmental covariates from the training dataset. To select the most appropriate covariates to enter the model, a Variance Inflation Factor (VIF) diagnosis was carried out and covariates with high multi-collinearity were identified (Alin 2010). VIF values close to zero represent null or scarce collinearity, while values above 5 indicate high collinearity between variables (Allam et al. 2020; Vu et al. 2015). In this study, the VIF assessment was done independently for each depth, removing variables with VIF greater than 5 in order to reduce overfitting and improve models’ performances (Valbuena et al. 2017).

The RF was performed with the randomForest R-package (Liaw and Wiener 2003), using the training dataset and ntree: 1,000. In order to identify which covariates mainly influence the prediction of the SOC, we computed the variable importance assessment mean decrease in accuracy (%IncMSE), which is the percentage increase in mean squared errors (Zhou et al. 2019). The most important variables were defined as those that resulted in a significant reduction of predictive power when removed from the model (Behrens et al. 2014). The most relevant covariates for predicting SOC content based on the RF approach in the Casanare flooded savannas were established from the top-five most important covariates used at 0-10 cm and 10-30 cm soil depth models.

DSM Assessment

Model performance for EK and RF approaches was assessed based on three statistical indices applied to the validation dataset: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). Lower values of RMSE and MAE indicated better model performance, while higher values of R2 indicated that the model explained a higher proportion of the variation.

SOC Stocks Spatial Distribution

SOC stocks (t ha-1) at 0-30 cm soil depth were calculated for each sampling point by multiplying the SOC content and bulk density using equation (2) and subsequently compared under different geomorphic features and land cover conditions through a mean comparison test (“Statistical Analysis” section).

$$SOC\;Stock\;(ton/ha)=SOC\cdot BD\cdot\;Depth\cdot(1-G/100)\cdot10$$
(2)

Where: \(SOC\) is the soil organic carbon content \((\mathrm{\%})\); \(BD\;is\;the\;soil\;b\) ulk density \(\left(Mg/{m}^{3}\right);\) \(Depth\;is\) the depth of the soil layer \(\left(m\right)\); \(G\) is the volume (%) occupied with coarse fragments (gravel or stones) and 10 is the conversion factor to t/ha. Note that sampled soils in the area did not contain coarse fragments.

The spatial distribution of SOC stocks was modeled using the best-performing DSM approach for SOC content at 0-10 cm and 10-30 cm and the respective BD maps developed previously for the study area with a random forest model. Basic statistics of predicted SOC stocks per land cover were calculated .

Results

Soil Properties Characterization

Descriptive statistics of the measured physicochemical soil properties are presented in Table 1. The highest mean values of SOC and Nt contents were found in the semiseasonal savannas and the lowest values were found in the seasonal savanna. Isotopic signatures of δ13C indicate the relative proportion of carbon originating from trees and shrubs with a C3 photosynthetic pathway or grasses with a C4 photosynthetic pathway. As expected, depleted values in the forest stands showed carbon from predominantly C3 sources, while both seasonal savanna and hyperseasonal savanna showed predominantly C4 sources. Depletion was intermediate in the moriche palm and semiseasonal savanna showing mixed sources of SOC in these systems. The measurement of δ15N provides an integrated index of the N cycle and all sites showed δ15N enrichment indicating an open or “leaky” N cycle (Craine et al. 2015). Enrichment was lower in the Moriche palm and semiseasonal savannas compared to the other land covers, suggesting that these ecosystems were less leaky. Carbon to nitrogen ratios were similar across land covers.

Table 1 Descriptive statistics of chemical and physical soil properties for the different land cover in the flooded savanna landscape of Casanare. Values are presented for two soil layers 0-10 cm and 10-30 cm depths. Values are mean ± SD and values for each parameter followed by the same letter are not significantly different from each other (P < 0.05)

Soil bulk density ranged from 0.74 to 1.42 g cm-3 (0-10 cm) and from 1.02 to 1.46 g cm-3 (10-30 cm). Higher values of BD were measured in the seasonal savanna, which are characterized by a high sand content, while lower BD values were found in the semiseasonal savannas as a result of higher soil organic matter and clay contents. SOC stocks were significantly higher in clay soils and semiseasonal savannas compared to seasonal savanna and forest.

SOC Distribution Across the Different Environmental Conditions

Measured SOC content at soil depths of 0-10 cm and 10-30 cm and its distribution across the environmental conditions of geomorphic features, soil textures, and land cover are presented in Fig. 3. In general, SOC content was around 1.64% higher on average at soil depth 0-10 cm than at 10-30 cm in all environmental conditions.

Fig. 3
figure 3

Boxplot diagrams showing soil organic carbon (SOC) content (%) distribution at soil depth 0-10 cm and 10-30 cm under different environmental conditions in the Casanare flooded savannas, Colombia. Grey dots represent single values of sampled soils, the black horizontal line in the middle of the box represents the median value, lower and upper boundaries indicate the 25th and 75th percentiles, blue crosses represent outliers, vertical black lines (whiskers) denote the maximum and minimum values (excluding outliers). Uppercase and lowercase letters denote statistically significant differences at P < 0.05 using the Dunn post-hoc test at 0-10 cm and 10-30 cm, respectively

Among the different geomorphic features, dips had significantly higher SOC content than talfs, rises, and continental dunes. Rise features did not show differences with talfs and continental dunes, while these last two differed significantly at both soil depths. The highest SOC content of the study area (14.5%) was found at soil depth 0-10 cm in the dip feature, while the highest value in talf, rise, and dune for the same depth were 6.3%, 5.5%, 1.3%, respectively. The largest mean reduction of SOC content between soil depths occurred in the dip feature, where mean SOC content decreased from 7.2% at 0-10 cm to 2.82% at 10-30 cm, followed by talf (2.48% » 1.28%), rise (2.01% » 1.02%) and continental dune (0.61% » 0.38%).

Differences in SOC content were also observed among the different land covers. For soil depth 0-10 cm, SOC content was higher in semiseasonal savannas, with values ranging from 4.2% to 14.5%, and lower in seasonal savanna, with values ranging from 0.4% to 3.6%. For soil depth 10-30 cm the same pattern was observed, but the differences in SOC content among land covers were lower than those observed in the surface layer.

Soil texture also played an important role in SOC accumulation for this region. Although loamy soils presented SOC content at the surface up to 14.5%, the average was 3.8% and this was much lower than the minimum SOC content (4.2%) found in clayey soils. Coarse textured soils had the lowest SOC content, with values inferior to 3.6% (0-10 cm). At 10-30 cm soil depth, higher values of SOC were found on clayey and loamy soils and lower content in coarse soils.

Assessment of DSM Approaches

The performance assessment using the validation dataset showed small differences of RMSE, MAE, and R2 between the EK and RF approaches at both soil depths, with smaller prediction errors (RMSE and MAE) at the 10-30 cm depth (Table 2). The RF presented lower values of RMSE and higher values of MAE and R2 compared to the EK at 0-10 cm, while at 10-30 cm EK presented lower values of RMSE and MAE and higher R2 values. Figure 4 shows the scatter plots of the relationship between the dataset used for SOC (%) model validation using RF and EK approaches and the measured data.

Table 2 Model performance of soil organic carbon (SOC) content (%)
Fig. 4
figure 4

Scatter plots showing the relationship between the dataset used for SOC (%) model validation using RF (blue dots) and EK (red dots) approaches and data measured in the Casanare flooded savannas, Colombia for 0-10 cm and 10-30. Dots represent the values for each variable while the lines represent the best fit for the correlation between them

Digital Soil Mapping of SOC Content

Prior to implementing the EK and RF models, environmental covariates for each approach were selected. Covariates selected for implementing the EK at 0-10 cm and 10-30 cm were land cover, moisture stress index for the dry period – MSI (dry), green-red vegetation index for the dry and transition: wet-dry period – GRVI (dry and trans), relative slope position – RSP, clay, silt, sand, and the SAGA wetness Index, which is a 'Topographic Wetness Index' (TWI), based on a modified catchment area calculation. For the RF model, an independent set of covariates was selected for each depth after the elimination of highly collinear covariates through the VIF assessment (see Table S1 in supplementary material). Figure 5 shows the top-five most relevant covariates for predicting SOC ranked by the %IncMSE for the RF model at each depth. The variable importance ranking identified local land cover, normalized difference vegetation index for the wet period – NDVI (wet), sand, and SAGA wetness index as common covariates for predicting SOC at both depths. Local land cover was found to be the most important covariate at 0-10 cm, contrary to the 10-30 cm depth in which it was placed at the bottom of the ranking. Normalized difference moisture index for the transition: wet-dry period – NDMI (trans), in turn, was the most important variable at 10-30 cm. NDVI (wet) and sand remained at the same position of the ranking at both depths, while the SAGA wetness index was more important for the 10-30 cm depth. Several covariates, such as local land cover, SAGA wetness index and sand, were utilized in both EK and RF approaches. Additionally, covariates regarding the vegetation properties were the most frequently used in both DSM approaches.

Fig. 5
figure 5

Importance of environmental covariates used in the RF models for the study area according to the mean %IncMSE (percentage increase in mean square error). High values represent a high importance of the covariates in predicting soil organic carbon (SOC) content (%). BSI: Bare Soil Index, NDVI: Normalized Difference Vegetation Index, NDMI: Normalized Difference Moisture Index, wet: wet period -> May (2018) - October (2018), trans: transition dry-wet: November (2018) - December (2018)

Maps of SOC content with a spatial resolution of 12 m developed using both the EK and RF methods at soil depths of 0-10 cm and 10-30 cm are shown in Fig. 6. In general, RF overestimated the minimum and underestimated the maximum SOC contents while EK underestimated both minimum and maximum (Table 3). Comparing between minimum values of the predicted SOC content maps (EK and RF) and the measured data, we observed smaller differences for EK than RF at both depths, while RF had smaller differences for predicting the maximum values. At 0-10 cm, both approaches presented similar meanw SOC content of 4.20% (RF) and 4.22% (EK), representing a higher meanw prediction in comparison to the measured meanw value (3.63%). At soil depth 10-30 cm, RF overestimated the meanw value compared to the measured value (difference of 0.28%), while EK slightly underestimated the meanw values (difference of 0.06%).

Table 3 Descriptive statistics of SOC (%) predicted using EK and RF approaches and measured data collected in the Casanare flooded savannas, Colombia for 0-10 cm and 10-30 cm

In general, the two approaches presented a similar pattern regarding the spatial distribution of SOC content (Fig. 6). The highest values of predicted SOC content were found at locations that experienced longer periods of seasonal flooding (darker areas) and the lowest values were predicted in the continental dunes with sandy soils and no seasonal flooding. However, there are differences in the predicted SOC values in some areas such as the large gallery forests located in the floodplains of the Ariporo and Meta Rivers where RF estimated higher values of SOC for both soil depths (0-10 cm and 10-30 cm). Predicted values of SOC in the hyperseasonal savannas tend to be lower for the RF than for the EK model at 0-10 cm; while at 10-30 cm, predicted SOC values across the hyperseasonal savannas are similar.

Fig. 6
figure 6

Spatial distribution of SOC (%) using two different DSM approaches (EK and RF) in the Casanare flooded savannas, Colombia at depth intervals: 0-10 cm and 10-30 cm

SOC Stock Distribution

SOC stocks over the entire 30 cm profile determined using field data and scaling by geomorphic feature and land cover type are summarized in Fig. 7. Comparisons between measured SOC stocks across the different geomorphic features and land covers showed significant differences (P < 0.05) between dip (111 ± 30 t ha-1) and the other features, while no significant differences were identified between talf (63 ± 24 t ha-1), rise (49 ± 28 t ha-1), and continental dune (20 ± 8 t ha-1). On an ecosystem basis, semiseasonal savannas (118 ± 34 t ha-1) differed significantly from forests (48 ± 26 t ha-1) and seasonal savannas (44 ± 25 t ha-1) and did not show significant differences with Moriche palm (88 ± 28 t ha-1) or hyperseasonal savanna (83 ± 29 t ha-1). Overall, higher mean measured values were attributed to semiseasonal savanna land cover and dip geomorphic features, where lower values of BD and higher values of SOC contents were identified (Table 1). Lower SOC stock values were attributed to seasonal savanna land cover and continental dunes, which were characterized by lower SOC contents and high BD.

Fig. 7
figure 7

Boxplot diagrams showing SOC stocks (t ha-1) distribution at soil depth 0-30 cm under different environmental conditions. Grey dots represent single values of sampled soils, the black horizontal line in the middle of the box represents the median value, lower and upper boundaries indicate the 25th and 75th percentiles, blue crosses represent outliers, vertical black lines (whiskers) denote the maximum and minimum values (excluding outliers). Lowercase letters denote statistically significant differences at P < 0.05 using the Dunn post-hoc test

Due to the slightly better capacity of EK to describe and map the spatial distribution of SOC content for the study area, the map of SOC stocks at 0-30 cm soil depth was developed using this approach and selected covariates described in “Digital Soil Mapping” section. The corresponding model performance parameters were RMSE: 26 -, MAE: 23, and R2: 0.69.

(Circles indicate the larger areas of higher SOC stock accumulation)

Predicted SOC stocks ranged from 6 to 210 t ha-1, with average stocks of 83.13 ± 24.32 t C ha-1 (Fig. 8). Similar to SOC content, high (150-180 t ha-1) and very high (> 180 t ha-1) SOC stocks were found in areas characterized by their proximity to water bodies, their location in floodplains with clay-loam soils, and the presence of long-term flooded conditions (circles indicate the larger areas of higher SOC stock accumulation). Medium-high values (120-150 t ha-1) were found adjacent to areas with higher SOC stocks. The hyperseasonal savanna land cover, which occupies the larger proportion of the study area, is characterized as a mix mainly between medium-high and medium SOC stocks (90-120 t ha-1) in dip features and the medium and medium-low (60-90 t ha-1) stocks in well-drained talf features. Low values (30-60 t ha-1) of SOC stocks were largely predicted in riversides along the Ariporo River and the Meta River as well as along other rivers, where coarser soil textures predominate. Very low values (< 30 t ha-1), in turn, were characteristic of areas where continental dunes (northeasterly elongated geomorphic features) and sandy patches predominate.

Fig. 8
figure 8

Map of SOC stocks (t ha-1) predicted from EK model at 0-30 cm soil depth in the Casanare flooded savannas, Colombia.

Using the land cover map (Fig. 1) as a mask, descriptive statistics of the predicted SOC stock map per land cover were calculated (Table 4). We found that the highest mean predicted SOC stocks values were in the semiseasonal savannas, followed by the hyperseasonal savanna, seasonal savanna, and forest, which combines riparian forest and Moriche palm. Total soil organic carbon stock predicted for the first 30 cm of soil depth in the entire study area (Fig. 8) was equivalent to 55.07 Mt of C. Hyperseasonal savannas contributed 65% (42.73 Mt) of the total stocks, while forest contributed with 22% (3.55 Mt), semiseasonal savanna with 7% (6.79 Mt), and seasonal savanna with 6% (2.00 Mt).

Table 4 Basic statistics of predicted SOC stocks (t ha-1) per land cover within the study area

Discussion

SOC Characterization

Carbon accumulation in tropical wetlands is typically the result of small differences between input and outputs, and is often associated with topography that concentrates water and creates anaerobic conditions that slow decomposition (Gumbricht et al. 2017; Paul 2016; Sjögersten et al. 2014). Carbon accumulation is also regulated by the types and quality of vegetation, particularly the proportion of plant-derived C and N that is incorporated into soil organic matter, and by soil matrix interactions that control its stabilization (Cotrufo et al. 2013; Paul 2016). Clay and silt fractions play important roles in stabilizing soil organic matter through both physical and chemical protection. We found strong associations between geomorphology, plant communities, and soil texture that affected the SOC concentrations in the flooded Llanos del Orinoco and in the surrounding ecosystems. There are few data from South American floodplain wetlands for comparison and available data do not represent systematic sampling, but we found that mean SOC contents are higher in the Casanare flooded savannas compared to the levels reported for flooded savannas in the Brazilian Pantanal (Vega et al. 2014), the Brazilian Cerrado (Wantzen et al. 2012) or the Bolivian Llanos do Moxos (Boixadera 2003), and they were similar to flooded savannas in Roraima, Brazil (Barbosa et al. 2012).

We found high SOC contents in locations characterized by clayey soils with high grass productivity (semiseasonal savannas), and depressions (dip features). SOC accumulation in these areas have been favored by frequent and excessive precipitation, high concentrations of sediments, and long periods of water accumulation (Barthelmes and Joosten 2018; Sarmiento 1984; Mora-Fernández et al. 2015). Sediment transportation processes occurred during the Tertiary and Quaternary deposited coarser materials, as a result of the erosion of the Andes mountains, along most of the river edges (Goseen 1964), forming banks that do not store much SOC. We found low SOC contents in riparian areas, in continental dunes, and in sandy deposits that are characterized by low water availability, coarser soil textures, and low-density vegetation (seasonal savanna). In riparian areas located specifically in silty-sandy banks, within the first 30 cm prevail dry conditions during a long part of the year, which affects the accumulation of SOC. Aeolian erosion by the Trade winds has acted as an essential factor in shaping the regional topography, particularly in the formation of continental dunes and sandy deposits (FAO 1965). Intermediate SOC contents were associated with Moriche palm and hyperseasonal savanna vegetation and loamy textured soils.

DSM Models Assessment

We compared two different DSM approaches (EK and RF) for predicting SOC content at 0-10 cm and 10-30 cm. The models performed comparably to map SOC variability in the study area and gave a similar spatial prediction pattern. The models both predicted high SOC contents in dip features, where soil textures are finer, flooded conditions last longer, and vegetation has higher biomass production rates, and low SOC values on the continental dunes with coarser soils, low-density vegetation, and no seasonal flooding. However, with validation against field data, we found that the spatial predictions obtained with the EK represented field conditions better. A key major difference between the modelling approaches was associated with the riparian areas along Ariporo and Meta rivers. The RF model predicted much higher values for these areas than the EK model, which showed predictions that were more consistent with field observations. Additionally, EK showed lower prediction errors than RF. Both models were sensitive to the presence of outliers. RF underestimated the highest values and overestimated the lowest, while EK underestimated the highest and lowest values and gave overall predictions around the mean values.

Lower performance in RF was attributed intrinsically to the methodology. The algorithm estimates the values for each pixel, based on an average of the results of all the independent trees created in the model (Callens et al. 2020), and generates an approximation to the field values based on averages of most similar areas, which limits its ability to describe local extreme values (Szabó et al. 2019). This situation probably caused the misprediction along the rivers, which may have been influenced by other areas with similar environmental conditions. RF has been used effectively in several studies for estimating SOC at landscape scale (Grimm et al. 2008; Wang et al. 2020; Zeraatpisheh et al. 2019; Zhou et al. 2019). Its prediction power has been shown to outperform other DSM approaches, specifically in larger study areas with a diversity of landscape features (Lamichhane et al. 2019), capturing most of the variance of the soil characteristics (Sindayihebura et al. 2017). Other advantages of RF include the modelling of high dimensional data, which would be difficult to afford with EK; the estimation of internal errors and the variable importance assessment; as well as a lower effort for data preparation and modelling time (Breiman 2001; Camera et al. 2017; Grimm et al. 2008). However, Ashtekar and Owens (2013) mentioned that a disadvantage of this method could be that it only works with soil-landscape relationships at specific locations and does not take into consideration extensive differentiation across the landscape. Additionally, Hengl et al. (2015) stated that RF is effective only within the range of values used in the training data, which provides irregular predictions for observations outside of the range (Terra 2017).

EK, in turn, is based mainly on qualitative and quantitative criteria provided by the knowledge of an expert, that takes into account soil-landscape relationships for a specific region based on experience, analysis from sampling data, and knowledge of interactions between covariates that influence the property distribution (Ashtekar and Owens 2013; Menezes et al. 2013; Ngunjiri et al. 2019). The use of EK led to a better control and implementation of inference rules that explain soil-landscape relationships, even with limited field data (Menezes et al. 2018). EK approaches have been satisfactorily applied by other authors (Akumu et al. 2015; Ashtekar et al. 2014; da Silva et al. 2016; Minai et al. 2021; Tsakiridis et al. 2019; Zhu et al. 2010). As evidence of its good performance, this approach has been widely used for the development of soil surveys by the US Department of Agriculture (USDA) (Zhu et al. 2001). On the other hand, EK is time consuming in terms of data selection, analysis, and processing. It requires knowledge about the mapping area and soil-landscape relationships, which may be limited for larger areas.

Drivers of SOC Spatial Variability

Similar to other tropical landscapes studies: (Diwediga et al. 2017; Hamzehpour et al. 2019; Yang et al. 2007), in the Casanare flooded savannas, the vegetation coverage, topography, and soil texture were found to be the most influential factors of the SOC distribution. The combination of these 3 features leads to different environmental conditions that consequently influence the spatial distribution of SOC, as evidenced in this study.

Vegetation coverage not only controls the inputs of organic material but also protects soils from erosion and conserves soil moisture (Lamichhane et al. 2019; Weil and Brady 2017). Several studies have represented the vegetation coverage in spatial models across the landscape employing categorical land cover maps or continuous vegetation indices derived from remote sensing (Gray et al. 2015; Minai et al. 2021; Mondal et al. 2017; Wan et al. 2019; Wang et al. 2020). Land cover maps and vegetation indices help to classify vegetation in terms of types, biomass production, residues, and soil textures based on changes in leaf colors and leaf density (Motohka et al. 2010). Specific combinations of these characteristics delimit areas from higher carbon fixation such as the semiseasonal savanna to lower carbon fixation such as the seasonal savanna. Vegetation indices also show where to expect more plant water stress and less soil moisture content (Elhag and Bahrawi 2017), being relevant not only for soil moisture variability but also for SOC spatial modeling (Welikhe et al. 2017). Topography plays an imperative role in SOC storage across the landscape (Patton et al. 2019; Wiesmeier et al. 2019). It widely describes water flow paths, water accumulation, soil erosion, and sedimentation, and consequently influences the soil moisture, microbial activity, and SOC accumulation (Sørensen et al. 2006; Yun et al. 2019). Topographic covariates that indicate potential moist/dry areas such as topographical wetness index improves SOC predictions (Adhikari et al. 2014; Lamichhane et al. 2019; Minasny et al. 2013; Siewert 2017; Taghizadeh-Mehrjardi et al. 2017; Wiesmeier et al. 2013).

Additionally, as identified in several studies (Funes et al. 2019; Hounkpatin et al. 2018; Yang et al. 2007; Zinn et al. 2005), the results also showed that texture was also key in explaining the distribution of SOC. Coarser soil textures were correlated negatively with SOC, while finer textures showed a positive relationship. Clay soils have a greater capacity to preserve or protect organic matter from microbial attack as well as to stabilize the organic matter through mineral–organic matter binding. By promoting soil aggregations, silt and clay contents improve SOC protection (Six et al. 2002; Xie et al. 2021), enhance soil moisture (Augustin and Cihacek 2016; Yang et al. 2016), water holding capacity, and plant available water. These conditions reduce or avoid SOC losses by soil erosion, enhance plant biomass production, and, consequently, conserve or accumulate SOC. On the other hand, sandy soils do not physically or chemically protect SOC, they favor nutrient losses by leaching affecting plant density, both of which resulting in low SOC levels (Hamzehpour et al. 2019).

As seen in this section, the identification and understanding of factors that control SOC, as gained from this study, is crucial to generate better SOC-related models, especially for grassland ecosystems with similar conditions around the world; improve those already developed, generate effective policies for SOC sequestration that contribute to mitigate climate change; as well as to improve soil health, biodiversity conservation and ecosystem services (Gray et al. 2015). Similar landscapes in South America lack carbon studies and their importance in terms of carbon accumulations is often understated. The application of these study’s methodolgies in other areas would help improve our understanding of their importance and target protection actions in priority areas.

SOC Stocks Distribution

The Casanare flooded savannas landscape holds important SOC stocks (55.07 Mt of C), which may be affected by land use changes and unsustainable management practices. While the pan-tropical wetlands map of Gumbricht et al. (2017) predicted the presence of peat deposits in this landscape, we found only mineral soils with high SOC contents. Nevertheless, the study area represents only around 4.43% of the flooded Llanos del Orinoco landscape, which accounts approximately 15 Mha (Hamilton et al. 2002). The numbers from our study would suggest that the total carbon stock across this vast ecosystem is on the order of 109 tons of C, with about 20 to 30% stored in flooded savanna ecosystems.

The expansion of agriculture and introduction of new land use practices could result in significant wetland loss, contributing to the already high levels of loss in South America and increasing CO2 emissions into the atmosphere (Guo and Gifford 2002; Laganière et al. 2010; Powers 2005; Zhou et al. 2019). Considered as the last agricultural frontier in Colombia, recent developments in the Colombian Llanos have seen the expansion of industrial crops (African oil palm, sugar cane, rice, and maize) as well as increases in grazing area, which have replaced gallery forests, wetlands, and native vegetation (Usma and Trujillo 2011). This alters both the hydrological regime and the composition and structure of plant communities across the landscape (Lasso et al. 2010). In particular, the wetlands in this landscape have been transformed into minimally productive rice fields, due to the expansion of agriculture in the region. Recurrent burning, a controversial land preparation technique to enhance pasture productivity, has devastated native palm ecosystems and forests, which have been colonized by grasses. In addition, pollution by fertilizers and the demand for water from crops threaten the natural state of flooded ecosystems, Moriches, and other aquatic communities in the savannas, putting their continued existence at risk (Lasso et al. 2014).

Land use change in the tropics and subtropics is globally the second largest CO2 source to the atmosphere, after combustion of fossil fuels (Barrezueta-Unda et al. 2019; Lal 2015; Van der Werf et al. 2009). The use of wetland soils for agriculture, native vegetation conversion, deforestation, and intensification of fires have largely contributed to the GHG emissions (Margono et al. 2014). In Peru, Moriche palm ecosystems are under increasing human pressure (Hergoualc’h et al. 2017; Horn et al. 2012) and we see evidence of this in our study area.

According to Mora-Fernández et al. (2015), the department of Casanare is one of the most transformed and less protected in the country (Romero Duque et al. 2018; Usma and Trujillo 2011). Land use changes, mainly caused by the accelerated expansion of agricultural activities, the increase of wildfires and the mechanical modification of the water regimes, have been the most harmful and stressful human activities for the ecosystems of the floodplain savannas (Mora-Fernández and Peñuela-Recio 2013). These changes cause the loss of its natural vegetation, the variation of the water dynamics, and the alteration of carbon cycle (Ostle et al. 2009). Although the study area represents only 15% of the total area of the department, the intensive pressure of human development could result in the reduction of its SOC stocks and the release of important amounts of GHG into the atmosphere (Ostle et al. 2009; Schreier et al. 1994; Sharma and Sharma 2022). At regional level, the impact of a large-scale land use conversions of the flooded Llanos del Orinoco ecosystem area (15 Mha) could transform this area in a future source of important global emissions if correct decisions are not taken regarding the land management of the region.

Acting on the commitments of the Paris Agreement, several countries have updated their GHG emissions mitigation strategies, aiming to reduce GHG emissions and keeping a rise in global average temperature below 2 °C (Carvajal-Agudelo and Andrade 2020). Given the importance of wetlands for local climate resilience and the significant carbon impacts, the protection and restoration of wetlands should be an integral part of national actions to address the climate change problem and support low emissions, climate resilient development. South America is considered as the wettest continent in the world, counting with 22% of its area covered by wetlands (Junk et al. 2013). Nevertheless, conservation of these important ecosystems is lagging in comparison to other regions due to the slow development of definitions, delineations, and classifications of their wetlands (Prahalad and Kriwoken 2010). The national inventories are not updated, and the ecological importance of these ecosystems is widely undocumented. Thus, the development of accurate SOC and wetland maps plays an essential role, since an adequate inventory and monitoring of these carbon-rich ecosystems will lead to better ecological characterization, quantification of SOC stocks and assessment of GHG emissions (Page et al. 2011), which will contribute to avoiding unwanted impacts on society and maintaining the environmental integrity of South American landscapes. Much attention has been paid recently to swamp forests, but other wetlands in South America are in similar situations. Therefore, it is important to increase the understanding of SOC sequestration and interactions, provide the appropriate assistance to farmers about land management practices and the subsequent creation and application of conservation strategies such as silvopastoral or agroforest systems (Amézquita 1999; Jadán et al. 2015), which have demonstrated a positive balance of SOC storage (Silva-parra 2018), and the definition of global GHC emissions mitigation strategies.

Conclusions

In this study, the status of SOC content and stocks at 0-30 cm soil depth in the Casanare flooded savannas was assessed. Results demonstrated the presence of carbon levels lower than expected, but high relative to other South American flooded savannas. The spatial variability and driving factors of SOC content were evaluated through the implementation of two DSM approaches: Expert Knowledge (EK) and Random Forest (RF). Although both DSM approaches performed very well, EK was considered slightly superior to predict SOC across the study area. Vegetation coverage, topography and soil texture were the most relevant factors in explaining the spatial variability of SOC content, attributing higher SOC contents to dip features, finer soil textures, and high-density vegetation.

Our study showed very high carbon stocks in the study area and a high potential of GHG emissions associated with intensive agricultural development in the region, in addition to loss of large areas of wetlands. SOC stock quantification indicated the importance of maintaining or even improving the carbon sequestration in the area to avoid carbon losses and, consequently, increasing CO2 emissions to the atmosphere. Inappropriate land and soil management of these ecosystems could hamper the efforts of Colombia and the global community on carbon sequestration and reduction of CO2 emissions. Additionally, these findings reinforce the view that wetlands, as the Casanare flooded savannas and similar areas elsewhere in South America, are relevant carbon storage ecosystems that must be considered of worldwide interest and protection.