1 Introduction

Increasing soil organic matter can mitigate climate change by sequestering atmospheric carbon (Batjes 2019; Bossio et al. 2020; Amelung et al. 2020; IPCC 2021). Agricultural soils are of particular interest as they have undergone significant anthropogenically induced changes (Quine et al. 1997; Van Oost et al. 2007; Sanderman et al. 2017). However, effective management could restore and increase the carbon reserves (Spencer et al. 2011; Batjes 2019; Bossio et al. 2020). Enhanced amounts of soil organic carbon (SOC) can have co-benefits such as enhanced water retention, higher biodiversity, and higher resilience to climate-change induced droughts (Guillaume et al. 2022). From an environmental and economic policy dimension, there is increasing interest in instrumentalizing agricultural soils for enhanced natural carbon sequestration (Sikora 2020). There are numerous studies and initiatives that attempt to instrumentalize soil carbon stocks to actively offset carbon emissions, also referred to as carbon farming (Spencer et al. 2011; Black et al. 2020). In order to ensure the long-term removal of carbon from the atmosphere, the carbon must stay in the soil for an extended period of time (concept of permanence, Lutzow et al. 2006, Oldfield et al. 2022a). Land users can actively contribute to the increase of their carbon stocks by changing their soil management, e.g., by ceasing tillage or changing fertilizer use (the concept of Additionality, Black et al. 2020). Protocols are emerging to facilitate Monitoring, Reporting, and Verification (MRV) in the framework of carbon farming, but challenges remain regarding robust scientifically backed methods and the documentation on permanence, additionality, and leakage (Oldfield et al. 2022a). In light of current policy developments such as the European Green Deal and the EU climate action on sustainable carbon cycles, the importance of robust carbon monitoring is likely to increase in the coming years (Elkerbout 2020; Amelung et al. 2020; The European Commission 2022).

The main challenge in instrumentalizing soils to sequester atmospheric carbon is the implementation of a method which is robust, affordable, and scalable and which is coupled to feasible land-use management advice (Spencer et al. 2011). There are methods which assess soil carbon stocks using satellite data only (Köchy et al. 2015; Poggio et al. 2021), but they are limited with regard to their resolution and their ability to monitor changes over time on field and farm levels. For example, for SoilGrids, this is 250 by 250 m, which constitutes a resolution which is too coarse for numerous fields to accurately assess carbon stocks for carbon credit certification (Black et al. 2020). An additional drawback of satellite-based global models is that these models are geared toward optimal predictions on a global or national scale, limiting the reliability on farm-level (Poggio et al. 2021). Remote sensing can be supplemented by field-based measurements using wet chemistry (Van Der Voort et al. 2019). Alternatively, it can be supplemented by proximal sensing methods such as soil spectroscopy (Bellon-Maurel and McBratney 2011; Soriano-Disla et al. 2014; Gobrecht et al. 2014; Shen et al. 2022). The high cost of wet chemistry measurements constitutes a major hurdle for the implementation of carbon farming (Kragt et al. 2012; Alexander et al. 2015; Tang et al. 2016). An additional limitation of wet chemistry is that lab facilities are not readily accessible and available everywhere, in particular in developing countries. Soil spectroscopy methods are gaining ground rapidly in agricultural sciences as key tools to map soil properties rapidly and for large areas without the need of wet chemistry measurements (Nocita et al. 2015; Smith et al. 2020; Trontelj 2021). However, these types of measurements are associated with lower accuracy as compared to classical wet chemistry measurements (Soriano-Disla et al. 2014). Fusing of both remote and proximal sensing data can combine the strength of these approaches and deliver cost-effective soil mapping (Asgari et al. 2020). Sampling strategies are also key to accurately map soil carbon stocks as they must capture the spatial heterogeneity of an area (Goovaerts 1998; van der Voort et al. 2016). Robustly mapped soil carbon stocks are needed in order to to capture the temporal change and assert permanence of carbon and the impact of additionality (Van Der Voort et al. 2019; Smith et al. 2020; Oldfield et al. 2022a, b). Grid-based sampling (e.g., measuring every 10 m) yields robust estimates of spatial heterogeneity and can assess temporal changes but is highly labor intensive (Bivand et al. 2013; Nussbaum et al. 2014; van der Voort et al. 2016; Van Der Voort et al. 2019). Alternatives to grid-based sampling methods can be leveraged to capture similar levels of heterogeneity with a lower average sampling density such as with conditioned Latin hypercube sampling (cLHS) (Brus 2019; Minasny and McBratney 2006; Yang et al. 2016). Machine learning can be used to predict patterns of carbon stocks and reduce the need for additional sampling (Bivand et al. 2013; Nussbaum et al. 2014; Smeaton et al. 2021). However, overfitting in machine learning can limit the scalability of approaches (e.g. with random forests, Bivand et al. 2013). When an overfitted model is used, it may be applicable to the region it was trained on but not to other regions and is therefore not scalable. Furthermore, it is key that the best-performing machine-learning model is selected in order to get the optimal results and lowest possible residual errors (Padarian et al. 2020; Khaledian and Miller 2020). Changes in carbon stocks are optimally demonstrated at the decision-making level, i.e., at the farm level, but presently, this is challenging due to the significant cost associated to MRV processes (de Gruijter et al. 2016). This research gap can be addressed by developing methods which can determine soil carbon using a robust, affordable, and widely applicable approach which is useable at the farm-level.

Our objective in this study is to develop and test a method which can facilitate carbon stock monitoring in a wide range of settings. This approach leverages available satellite, on-field soil spectroscopy measurements, and machine learning techniques to create efficient sampling protocols and generate carbon stock estimates. This approach works on the farm-scale level and is tested on a range of arable fields in the USA on a range of soil types. In order to test the developed method, the optimal sampling density is determined and the error of carbon stock estimates are calculated for the farm and field level.

2 Material and methods

2.1 Carbon stock assessment protocol and field area

This section describes the key steps of the Wageningen Soil Carbon STOck pRotocol (SoilCASTOR): (1) the selection of spatial covariates from (satellite) data sources, (2) the selection of sampling locations, (3) the soil spectroscopy measurement of SOC in the field, (4) the training of a model on the available data, and (5) the calculation of the carbon stock with uncertainty estimates (Fig. 1). Code developed to facilitate this approach was developed in the statistical language R in RStudio (RStudio 2021, version 2021.09.0). The proposed protocol was tested in two farms in the states of Arkansas and Iowa in the USA. The fields locate between 35° and 41.3° latitude and −92.0° to 91.6° longitude (Fig. 2). The Arkansas farm consists out of five fields with a sum of ~140 ha, and the Iowa farm consists of three fields and a total of ~95 ha. The smallest field is ~10 ha and the largest ~64 ha. The land use of all fields is agricultural, and all soils are Alfisols. In Iowa, the fields are characterized as a silty loam to a silty clay loam. The parent material is Pleistocene loess. In Arkansas, the soils are also classified as silt loam. The parent material is also loess with occasional glacial deposits (Boiko et al. 2021).

Fig. 1
figure 1

Key steps of the Wageningen Soil Carbon STOck pRotocol (SoilCASTOR) method are (1) collection of covariates from (satellite) data sources, (2) the selection of sampling locations with conditioned Latin hypercube sampling (cLHS), (3) measurement of soil organic carbon (SOC) with the near-infrared (NIR) scanner in the field, (4) modeling of SOC using machine learning (ML), and (5) soil carbon stock estimates with uncertainties.

Fig. 2
figure 2

Overview of sampling locations in this study in the US states Iowa and Arkansas. Map made with OpenStreetMap.

2.1.1 Selection of spatial covariates

The method utilizes all available (satellite) data sources and indices to find effective covariates to predict soil carbon stock (Fig. 1). These covariates were selected based on factors that are known to correlate with soil carbon stocks, such as vegetation and soil moisture (Jobbagy and Jackson 2000; McBratney et al. 2003; Seneviratne et al. 2010; Wang et al. 2021). Data sources encompass the Sentinel-1 and Sentinel-2, digital elevation map, and ISRIC SoilGrids (Escadafal 1989; Nellis and Briggs 1992; Marsett et al. 2006; Van Doninck et al. 2012; Zakharov et al. 2020; Wang et al. 2021; Poggio et al. 2021). Relevant covariates were extracted for points on a 0.001 by 0.001 degree grid which corresponds to a ~10 m by ~10 m grid in the designated sampling areas (McBratney et al. 2003). To approximate vegetation, both the transformed vegetation index (TVI) (Nellis and Briggs 1992) and satellite adjusted total vegetation index (SATVI) from Sentinel-2 were utilized (Marsett et al. 2006). In order to assess soil moisture with Sentinel-1, the volumetric soil moisture (VSM) following Zakharov et al. (2020) was used. Covariates for SOC encompassed of Sentinel-2 spectral images and shortwave infrared (SWIR) bands (B11 and BI2) and second brightness index band BI2 (Escadafal 1989; Wang et al. 2021). From ISRIC SoilGrids, relevant covariates such as clay content, cation exchange content, and bulk density were extracted (Poggio et al. 2021). This approach to extract covariates relevant to predict SOC content is globally applicable and modular, i.e., it can take up more (local) data sources and covariates when available. In order to design a sampling scheme for each site, the fields are divided in a grid of ~10 m resolution. Each point becomes a potential sampling location (SI Fig. 1). Subsequently, for each grid point, all covariates were retrieved. Additional details on the covariates can be found in Supplemental Information (SI) Table 1.

Table 1 Overview of best performing machine learning models, evaluated by residual mean square error (RMSE) and r2.

Data sources were cleaned in order to avoid data of insufficient data coverage. Covariates were excluded when the variable is available for less than 99% of the potential sampling points. This yielded a total of 45 variables. For the variables for which there was sufficient data, missing values were imputed with the median values of the covariate. These missing values occur only in a few cases, and imputation is needed to avoid the removal of valuable covariates due to single missing data points.

2.1.2 Sample location selection using cLHS

Conditioned Latin hypercube sampling (cLHS) was used to select optimal locations of field measurements (Minasny and McBratney 2006; Brus 2019; Saurette et al. 2022; Fig. 1). With the cLHS, a subset of the potential sampling locations is selected using a stratified random procedure based on the multivariate distribution of the covariates (SI Fig. 1). The asset of this method is that, for example, two points that are similar in the multidimensional covariate space are not both selected for sampling. This allows for a lower sampling density than classical grid-based sampling. The selected sampling points by cLHS effectively capture the range of covariates of the plot. The optimal sampling density of the cLHS was evaluated by testing the effects of different sampling density on uncertainty in SOC prediction (see Section 2.1.5).

2.1.3 Field measurements

Field samples were taken at a depth between 0 and 30 cm with an open spiral soil auger, and the SOC was measured using the AgroCares near-infrared (NIR) scanner (AgroCares 2022) in Iowa and Arkansas. Per location a single sample was taken. The NIR scanner was trained on a dataset of ~18,000 lab-based measured samples using a one-dimensional convolutional neural network (Tsakiridis et al. 2020; Tsimpouris et al. 2021; Yang et al. 2020a, b). The exact sampling locations were given as XY-coordinates, provided by the cLHS method. If no suitable spot is found at the location, the field sampler could deviate up to 2 m around the point. If sampling was still not possible, the sample location was skipped. Sampling is done if possible on bare land, and any plant debris is removed if necessary. Stones exceeding >2 mm, and roots are removed (Van Der Voort et al. 2019; Walthert et al. 2002). The sample is not dried before the measurement. After thoroughly mixing, the soil sample was scanned with the NIR scanner, and data was immediately transferred digitally. Bulk density was not determined in the field, but estimated from soil organic matter and clay content using a pedotransfer function calibrated on arable soils in the Netherlands (Commissie Bemesting Akkerbouw en Vollegrondsgroententeelt 2022). Additional details on the sampling procedure can be found in the SI.

2.1.4 Modeling carbon stocks

Model selection for SOC stock estimates

In order to predict the soil carbon content for each point in the ~10 × ~10 m grid, machine learning (ML) models were built using the field measurements of SOC (%) measured with the NIR scanner and the covariates. In order to ascertain that the optimal model was used, we evaluated the use of two model target variables and a range of ML models and data transformations.

The two target variables that were evaluated are firstly the SOC (%) of the NIR scanner and secondly the difference between the measured SOC (%) and ISRIC SoilGrids SOC (%) (hereafter referred as SOCdif). The SoilGrids SOC is the SOC estimated as predicted by a global model (Poggio et al. 2021). The SoilGrid SOC content for 0–30 cm was calculated from the SOC content of 0–5 cm, 5–15 cm, and 15–30 cm, weighted by the depth of the three soil layers. The rationale of using SOC as a target variable is that the model optimizes for the locally measured SOC values. The rationale of using the SOCdif as target variable is that the NIR scanner can capture local heterogeneity of SOC and thereby it can fine-tune the global estimates of SOC. Models were built on the farm level. The tested algorithms are linear regression, partial least square regression, ridge regression, lasso regression, elastic net regression, decision trees, and random forest regression (Bivand et al. 2013). The applied data transformations on the target variable are log-transformation, box-cox transformation, standardized (value minus mean divided by the standard deviation), and no transformation. Subsequently, tenfold cross-validation on all NIR-measurements (n=205) was used to evaluate the performance of each model in the form of residual mean square error (RSME) of SOC in the validation datasets. The optimal approach was selected after comparing model performance (RMSE) of the various model options with differing in the target variable, algorithms, and data transformations. With the best model (with the lowest RMSE), SOC (%) for all grid points were predicted.

2.1.5 Estimating carbon stock with uncertainties

Conversion from SOC content to C stock

In order to attain carbon stock estimates for the individual fields, the soil carbon content of top 30 cm (g C kg−1) was converted to a carbon stock (g C 100 m−2) by multiplying the soil C content (g C kg−1) with the bulk density (kg m−3), the depth of the soil (m), and the area of the grid cell (100 m2). This amount was then converted to the unit of 10,000 m2 or one hectare, to align with carbon credit certification protocols (Black et al. 2020). Finally, field and farm C stock (in the unit of ton C) was calculated as sum of soil C of all grids located within the field or farm, respectively.

Estimate of uncertainty associated with scanner error

Estimates of carbon stock are unavoidably associated with errors. To assess a change in soil carbon content, e.g., in the context of carbon farming for carbon credits, it is crucial to quantify the errors (Minasny et al. 2017; Black et al. 2020). Here, we assessed the error attributed to the field measurements of SOC with the NIR scanner and quantified how the error propagates when estimating the carbon stock. Based on previous validation studies with >18,000 independent sample locations all over the world, we conservatively assume that the measured SOC value of NIR scanner is associated with a 30% error rate (i.e., the error follows a normal distribution with SD 30%; thus, the majority (68.3%) of samples is associated with an error ranging between −30% and +30%) over the range of 5 to 100 g C kg−1 for a single SOC measurement. This is a conservative assessment; in reality, the error can be lower (see SI for details). The effect of the NIR scanner error on C stock estimate was quantified with Monte Carlo simulations, which is in line with current carbon credit certification protocols (Black et al. 2020). A Monte Carlo simulation was applied on the NIR field measurements (n=205) with a random error on the SOC (mean 0%, SD 30%) for a hundred times. For each iteration, the whole procedure of C stock estimate (i.e., model selection, grid-level SOC prediction with the best model, and calculation of C stock on field and farm level) was repeated. Subsequently, the uncertainty range of the field-level and farm-level C stock was quantified. Additional details can be found in the SI.

Evaluation of sampling density

To explore optimum sampling density of the on-field soil spectroscopy measurements, field-level carbon stock was estimated with varying numbers of sampling points. By reducing the sampling density to the minimum, the costs associated with carbon farming can equally be minimized (Kragt et al. 2012; Tang et al. 2016). For each field, a fraction of cLHS-derived sampling points was randomly selected from the full set of the sampling points of the field. The tested fractions were 10, 20, 30, 40, 50, 60, 70, 80 and 90%. This approach was repeated 100 times, and the relative error (coefficient of variation, CV, in percentage) in carbon stock was evaluated. This approach ensures that the minimum requirement of sampling density for a field can be determined in a region where prior knowledge of the SOC in the neighbor fields is reasonably available. In other words, to evaluate the optimum sampling density of field A, data of other fields of the same farm (i.e., fields B–E) were leveraged to build the machine learning model. A random error of 30% on the measured SOC was added for all simulations. For each simulation, a certain fraction of the sampling points were randomly chosen for 100 times. Subsequently, 100 different field-level carbon stock estimates were computed. To evaluate the appropriateness of that sampling density, the CV of those 100 carbon stocks were calculated. A low CV value indicates that the field carbon stock estimate is similar among different subsets of sampling points, indicating that the carbon stock can be estimated robustly with the sampling density. A flowchart exemplifying these steps can be found in the SI.

3 Results and discussion

3.1 Carbon stock assessment protocol

3.1.1 Covariates

The covariates encompass indicators of soil carbon content (e.g., vegetation indexes and soil moisture, Van Doninck et al. 2012; Escadafal 1989; Marsett et al. 2006; Nellis and Briggs 1992; Wang et al. 2021; Zakharov et al. 2020). However, additional parameters, related to, e.g., the land management, ground water level, and even fertilizer level, are not included, even though they could be potentially important covariates for soil carbon stock (Minasny et al. 2017). The method here is modular, i.e., it could absorb and utilize other covariates when available. Potentially, SoilCASTOR can improve when additional covariates are included. More research is needed to ascertain this.

3.1.2 cLHS Sampling design and sampling density

The cLHS method and required sampling density was evaluated by comparing model performance (exemplified by the coefficient of variation, CV in percent) across a range of sampling densities (0.1–1.0 samples per hectare) (Fig. 3) (Bivand et al. 2013; Brus 2019; Minasny and McBratney 2006). By evaluating which sampling density is required to get optimal results, efficient and effective sampling campaigns for carbon farming can be set up. A major cost component is the labor investment of fieldwork (Gobrecht et al. 2014; de Gruijter et al. 2016). If model performance plateaus at a certain sampling density, additional sampling is not necessary. In total, 205 field samples were taken, 141 in Arkansas, and 64 in Iowa (Fig. 2). Details on the number of samples per field can be found in the SI. Results show that the cLHS sampling-based model results optimize (lowest CV %) at a sampling density of around 0.5 samples per ha for the majority of the fields (Fig. 3). In other words, for every 2 ha, about one sample is required to achieve a robust estimate of the farm C stock with a deviation less than 5%. Additional measurements do not add much to the improvement of the model. This implies that with relatively low labor time investment (15-30 min per hectare); large areas can be covered, overcoming a key obstacle in the implementation of carbon farming (Evans et al. 2015; Tang et al. 2016). Although cLHS is established in soil mapping (Yang et al. 2016, 2020b), direct comparisons of cLHS and grid- based studies are rare and require more extensive research (Saurette et al. 2022).

Fig. 3
figure 3

The coefficient of variation (CV%) in predicted field-level C stock versus the number of samples per hectare taken. Fields A–G are indicated by different colors and annotated in the figure. The CV is a measure of model precision, the lower the better. The CV levels of around 0.5 sample per ha.

3.1.3 Fieldwork and the NIR-based scanner

NIR-based scanners lend themselves to carbon stock analysis because they allow for high throughput (Bellon-Maurel and McBratney 2011). However, they are impeded by high associated errors (Bellon-Maurel and McBratney 2011; Gorbrecht et al. 2014). For this project, the AgroCares HandHeld NIR scanner was used, which leverages a measurement data exceeding 18,000 samples (AgroCares 2022). A conservative estimate of the error (mean 0%, SD 30%) for the range of 5 to 100 g C kg−1 is assumed. This error is propagated and resulted in a range of expected stocks. For example, for field A, average stock is 20.1 tC ha−1 within a range of 18.0–22.4 tC ha−1 (details in Section 2.1.9). Errors on GPS locations are minimal (max 2 m) as the sample is measured in the field and data is entered directly and is automatically associated to the correct location. Additional information on the HandHeld Scanner can be found in the SI 4. The SoilCASTOR protocol is also modular when it comes to the implementation of the scanner; thus, if more optimally performing NIR scanners become available, they can be implemented.

3.1.4 Machine learning model

The performance of the range of ML models (linear regression, partial least square regression, ridge regression, lasso regression, elastic net regression, decision trees, and random forest regression) and transformations (log-transformation, box-cox transformation, standardization and no transformation, Bivand et al. 2013) and target variables (SOC and SOCdif, Poggio et al. 2021) were compared (Fig. 4 and Table 1). The optimally performing algorithm was the random forest regression model with a box-cox transformation on the target variable SOCdiff (residual mean square error, RMSE = 0.240, r2 = 0.76). The random forest scored highest for both the target variables SOC% as well as the residual (difference between measured SOC% and ISRIC SoilGrids) (Fig. 4 and Table 1). The top-five performing models have an RMSE ranging from 0.240 to 0.253 and an r2 from 0.76 to 0.73, respectively (Table 1). This approach of running and evaluating multiple machine learning models, transformations, and target variables is comprehensive (Fig. 4). This multi-pronged approach leaves no stone unturned and allows for the selection of the model which is optimal in that instance (Padarian et al. 2020; Khaledian and Miller 2020). When this model is applied to other regions or datasets, other models may be more optimal, in which case the best performing model will be automatically selected.

Fig. 4
figure 4

Overview of tested machine learning (ML) models, transformations, and both target variables, evaluated by r2 and residual mean square error (RMSE). Carbon prediction means SOC as target variable; residual prediction refers to SOCdif. Abbreviation dt is for decision trees, elastic for elastic net regression, lm for linear regression, pls for partial least square regression, rf for random forest, and ridge for ridge regression. Transformations are box-cox (red); logtransform (blue); none (green); and standardized (yellow). Standardized is calculated as (value-mean)/standard deviation.

3.1.5 Carbon stocks and variability

The Arkansas and Iowa farms carbon stocks in the top 30 cm range from 32.8–38.7 and 25.4–29.6 tonC per hectare, respectively (Table 2). Carbon stocks in the fields for the same depth interval range between 14 and 84 ton C ha−1, with a mean of 34 and a median of 30 tonC per hectare (Table 2). This puts the stocks in the range as found in other studies (Jobbagy and Jackson 2000; Köchy et al. 2015; Nussbaum et al. 2014). There are significant inter and intra-field differences (Fig. 5). The carbon stocks were determined both per farm, per field and per hectare (Table 2). The carbon stocks per farm differ strongly, with Arkansas having a higher stock per ha (35 tC ha−1 ) and holding nearly double total carbon stock (5059 tC) as compared to Iowa with a lower stock per ha (28 tC ha−1) and lower total stock (2630 tC). When looking at individual fields, the ranges are even greater. Field C in Arkansas has the lowest stock per ha with an average of ~19 tC ha−1 contrasted by field B in Arkansas with the highest stock per ha at ~55 tC ha−1. The coefficient of variation (CV; standard deviation divided by the mean), a metric to evaluate the variability, is 4.3% for the Iowa and 5.0% for Arkansas farms. The CV of individual field-level C stock is slightly higher than for the composed farms and ranges between 5.4 and 9.4%. This shows that in order to gain a robust understanding of carbon stock dynamics, and in particular the element of carbon leakage, fields need to be individually assessed (Black et al. 2020, FAO 2020). These values are still below the thresholds established by most accreditation protocols (Black et al. 2020). The method captures small-scale (tens of meters) variability which can be matched with field-based assessments. The variability is strongly dependent on the field (Fig. 6). The bulk density was derived using the pedotransfer function (Commissie Bemesting Akkerbouw en Vollegrondsgroententeelt 2022). Potential errors on bulk density estimates were not available and not propagated. NIR measurement could potentially also be used to determine bulk density, but is not yet known how robust these results would be. More work is needed to ascertain this (Bellon-Maurel and McBratney 2011). To our knowledge, the SoilCASTOR approach is novel in the way it combines multiple data sources, comprehensive machine learning and offers robust soil carbon stock estimates.

Table 2 Overview of carbon stocks per farm and field. tC stands for a ton (1000 kg) of carbon. The tC range indicates field-level C stock uncertainty range (5th and 95th percentiles) associated with near-infrared (NIR) scanner error.
Fig. 5
figure 5

Overview of carbon stocks (tC ha−1) for all fields. Subplots indicate for (a) field A, (b) field B, (c) fields D, C, and E (clockwise), (d) field F, and (e) fields H (left) and G (right). Note the total area in fields ranges from 10 to 64 ha. In addition, legends differ per field for visual clarity.

Fig. 6
figure 6

Overview of variability of the carbon stock in each field. The box represents the 75th up to 25th percentile range; the line represents the median. The error bars indicate the 1.5 interquartile range above and below the 75th and 25th percentile ranges. Points indicate outliers.

3.2 Upscaling and integration into carbon farming

In order to be used effectively for soil carbon stock-derived carbon credits, quantification methods need to be widely applicable, compatible with carbon credit requirements and find the balance between cost and benefit for the land user.

3.2.1 Applicability on the globe and across a range of scales

The covariates used as inputs for SoilCASTOR steps one (selection covariates) and two (sampling location selection) are globally available. However, it is possible that there are additional regional datasets available, e.g., for the EU sphere (Tóth et al. 2013). Depending on other geographic areas, the available covariates may subsequently differ from the situation here. The set-up of the model is modular, meaning that if more rich datasets are available, they can be incorporated. This may, however, have implications for appropriate sampling densities, relevant covariates, and target variables and therefore needs to be assessed separately. Additional studies on farms in other regions and countries are needed in order to evaluate the optimal sampling density and associated accuracy. Furthermore, comparisons between cLHS-powered and traditional grid-based soil carbon strategies should be done in order to evaluate and compare the most robust and scalable solutions (Saurette et al. 2022). Scalability could be further limited because a key requirement for SoilCASTOR is that it is necessary that a field analyst go to the field and samples.

3.2.2 Carbon credit requirements

In order to transform the carbon monitoring data so it can be reported and verified by an independent organization, it is necessary to propagate error. Furthermore, it is key that there is a distinguishable difference between the initial and altered carbon storage of the soil (Black et al. 2020; Verra VCS 2020). Within this study, we followed the approach of Verra VCS VMD0053 on the model calibration (Verra VCS 2020). The error propagation gave a range for each field or farm (e.g., Arkansas farm, average stock ~35 tC ha, ranging from ~33–39 tC ha−1). Impacts of adjusted land managed changes (Lessmann et al. 2022) need to be significant enough (e.g., bigger than 0.5 tC ha−1 per year) for a 5-year period in order to cause a measurable and discernible difference over time. Within carbon certification processes, there are penalties for high uncertainties (Black et al. 2020; Verra VCS 2020). The present research project only envelops a single time-point, and a time series is necessary with samples taken in the same fashion in order to assess changes in soil carbon over a period (Van Der Voort et al. 2019). Increased carbon stocks also need to remain stored for more than a transient time period. In other words, permanence must be asserted (Oldfield et al. 2022a). In this context, it is crucial to consider the turnover times of soil organic matter (of added carbon) in both in the top and deep soil (Van der Voort et al. 20162019). Turnover in particular of labile compounds can be rapid leading to positive bias of mitigation measures designed to store carbon in soil (van der Voort et al. 2017; Berthelin et al. 2022). In order to effectively develop impactful land use management changes, soil science and biogeochemically driven modules should be incorporated into SoilCASTOR that can forecast impacts of these changes on soil carbon stocks. Furthermore, robust time-series sampling would need to be undertaken 5 to 10 years from now to evaluate the intermediate changes (Van Der Voort et al. 2019). Examples could, e.g., build on RothC (Jenkinson and Coleman 2008) and utilize radiocarbon both for decadal as well as millennial carbon turnover (Graven 2015; Galvez et al. 2020).

3.2.3 The carrot and stick of carbon credits

Actively brought-on changes in land-use practices (additionality) have been shown to positively impact carbon stocks (Lessmann et al. 2022). However, in order to incentive land users to implement carbon farming, the investment must be offset by the carbon credit value (Kragt et al. 2012; Tang et al. 2016). The SoilCASTOR calculations can be done rapidly and at low computational cost, but a time investment of ~15 min ha−1 remains a source of cost for carbon farming. However, proximal sensing data from the field remains a requirement for numerous carbon verification projects (Black et al. 2020). Nonetheless, by eliminating wet chemistry measurements, carbon farming may become more reachable for a range of farmers (Tang et al. 2016). More research is needed in this direction in order to ascertain that carbon farming is feasible with the socio-economic toolboxes that are present.

4 Conclusion and outlook

This paper presents the SoilCASTOR method which can be applied to determine the carbon stock robustly to a range of (agricultural) soil types with a relatively low measurement cost and time investment. The method presents a novel approach and leverages satellite data, field-based NIR-scanner measurements of SOC and machine learning to get optimal estimates of soil carbon stock. Therefore it can be widely instrumentalized to assess potential changes in carbon stocks in the framework of carbon farming. The carbon stock in the top 30 cm for the fields analyzed ranges between 19 and 55 t C/ha and the stock could be determined up to 10 m precision.

As an outlook, it will be key to include this method in certified soil carbon sequestration offsetting protocols, so it can be fully integrated in MRV (Black et al. 2020). It would especially be key to connect it to a module which can give advice on how to increase stocks (e.g., leveraging RothC, Coleman and Jenkinson 2014). Additionally, it would be insightful if it were applied to a wider range of fields (e.g., grasslands) and over a number of years (e.g., resampling after 3–5 years) Also, it would to be helpful to directly compare cLHS sampling to grid-based sampling strategy in the context of carbon stocks (Saurette et al. 2022). Uncertainty estimates could be improved when the uncertainty range for bulk density estimates becomes available. Furthermore, it could be investigated if this approach would also be appropriate for regional approaches (including multiple farms) instead of the current single farm-level focus. Another key element which can be further investigated is the maximum level up to which soil carbon stocks can be increased (Stewart et al. 2007; Castellano et al. 2015).