Introduction

Carbon (C) inputs to soil, much of it in the form of DOC, are expected to increase from CO2 fertilization (Drake et al. 2011; Jiang et al. 2020; Palmroth et al. 2006) and improved management practices (Cardinael et al. 2018; Maillard et al. 2017; Poeplau and Don 2015) such as those outlined in the ‘4 per 1000’ initiative, which aims at removing atmospheric CO2 by increasing soil C sequestration with an emphasis on long-term storage (Minasny et al. 2017). In addition, warming soils are likely to experience increased decomposition of particulate organic matter, leading to some increase in DOC production (Bengtson and Bengtsson 2007; Fu et al. 2019; Reynolds and Fenner 2001). This additional DOC can be lost through decomposition or runoff, or can sorb to mineral particles as it percolates down the soil column (Kaiser and Kalbitz 2012) thereby increasing the size of the total soil C pool with long-term storage characteristics. Currently, there is no existing global-scale estimate of how much additional C can be stored in soils through sorption of DOC. This lack of knowledge prevents an understanding of the upper limit of stable carbon storage in soils to inform policies aiming to increase soil C concentrations.

Different soil C pools and stabilization mechanisms exist, but a pool of C considered to be particularly stable is the mineral-associated fraction, measured using density or size fractionation (Cotrufo et al. 2019; Kleber et al. 2015; Poeplau et al. 2018). During the 1970s and 1980s, soil organic matter was thought to be stable to the extent that it contained chemically recalcitrant compounds (Kogel-Knabner 1986; Kogel-Knabner et al. 1988). Over the next three decades, researchers demonstrated that those so-called recalcitrant compounds can in fact be degraded under the appropriate conditions (Gleixner et al. 2002; Rasse et al. 2006), and identified other mechanisms that stabilize soil organic matter and prevent it from being respired: interaction with minerals (Gleixner et al. 2001; Torn et al. 1997), physical protection within aggregates (Tisdall and Oades 1982), and environmental limitations of microbial activities (Schmidt et al. 2011).

One key mechanism driving interactions with minerals is sorption, the formation of chemical associations between soil minerals and organic compounds. Sorption can protect organic C from decomposition, even if some of the compounds in organic C are labile or young in age (Eusterhues et al. 2003; Kaiser et al. 2002; Kiem and Kogel-Knabner 2002; Kögel-Knabner et al. 2008; Porras et al. 2018; Schmidt et al. 1999). Sorption is now considered to be one of the dominant mechanisms leading to a mineral-associated C pool (Cotrufo et al. 2013; Schmidt et al. 2011) and its importance is beginning to be recognized and explicitly incorporated in soil C models (Ahrens et al. 2015; Robertson et al. 2019; Sulman et al. 2018).

Laboratory batch isotherm experiments measure the equilibrium partitioning of C between the solid and solution phases, controlled by solution ionic strength, chemical composition, mineral surface area, and soil properties such as pH and mineralogy. While some simulation models represent these organo-mineral associations with a linear equation (Camino-Serrano et al. 2018; Nakhavali et al. 2017), the non-linear Langmuir equation is more realistic (Abramoff et al. 2018; Ahrens et al. 2015; Robertson et al. 2019; Tang and Riley 2015; Wang et al. 2013) because of its ability to capture the asymptotic behavior observed in laboratory sorption experiments. The Langmuir equation assumes direct and reversible association of a solute, here DOC, with a surface that has a limited saturation capacity (Langmuir 1918). Given measurements of Q, the amount of sorbed DOC, this empirical equation allows the prediction of the maximum sorption potential, denoted Qmax, as given by:

$$Q=\frac{{Q}_{max}\cdot k\cdot DOC}{1+k\cdot DOC}$$
(1)

where k is the equilibrium constant, which represents the relative tendency of the forward (at high values of k) or reverse (at low values) reactions. The original Langmuir Eq. (1) assumes a zero intercept, that the amount of sorbed DOC is zero when the DOC concentration in solution is zero. This is theoretically true in a completely reversible system, but does not account for the presence of native organic matter, some of which is released into solution during batch experiments at low DOC concentrations. To correct for this observed non-zero intercept, Lilienfein et al. (2004) introduced the intercept b, representing the amount of DOC released into the solution if the DOC concentration is 0 mg/L, and which has been used by subsequent batch experiment studies on native soils (Kothawala et al. 2009; Mayes et al. 2012). If we use the parameter b to account for the amount of DOC that leaves the native soil when no DOC is added, then any new DOC sorbed (Q) when DOC is added is additional to the organic C that already exists in the soil sample. Therefore, the maximum sorption potential (Qmax) is also additional to the organic C that already exists in the soil sample. Therefore, we rename Qmax as Qsp, to represent the interpretation of Qsp as the additional sorption potential that could be realized given that some mineral surface sites are already occupied in the native soil.

$$Q=\frac{{Q}_{sp}\cdot k\cdot DOC}{1+k\cdot DOC}-b$$
(2)

In this study, Qsp, k, and b are parameters fitted to laboratory batch experiments where different amounts of DOC are added to a well-mixed soil sample.

Sorption of DOC is a nano-to-micro-scale process, but there is evidence that sorption potentials are related to macro-scale differences in climate and soil properties. For example, Mayes et al. (2012) found relationships between sorption potential measured by batch experiments and clay fraction, pH, and soil order. Kothawala et al. (2009) also found a relationship between sorption potential and clay fraction. Soil texture, especially clay and silt fraction, are a commonly-used proxy for mineral surface availability in empirical models of soil C sequestration (Angers et al. 2011; Hassink, 1997; Wiesmeier et al. 2018), as well as in process-based models of soil C cycling (Sulman et al. 2018). Changing the pH balance of soil (e.g., via root exudation) has direct effects on sorption to mineral surfaces (Keiluweit et al. 2015).

Soil order is not a measurable quantity like the clay fraction but rather is a taxonomic classification that synthesizes important functional information, such as the dominant or typical mineralogy, weathering process, environmental conditions, and vegetation cover. Sorption may also be affected by climate, either through indirect controls on mineral weathering (Gislason et al. 2009), direct control of temperature on sorption dynamics (Abramoff et al. 2019; Conant et al. 2011), or the influence of precipitation on DOC infiltration down the soil column (Kaiser and Guggenberger 2005). Climate may also have indirect effects on ecosystem properties affecting DOC production and availability such as net primary production and average decomposition rates.

In this paper, we estimate the potential for additional DOC sorption from a new compilation of equilibrium batch sorption experiments (Feng et al. 2014; Jagadamma et al. 2012; Kaiser et al. 1996; Kothawala et al. 2009; Mayes et al. 2012). We fitted the Langmuir equation parameters from those experiments to estimate the additional DOC sorption potential. We trained a machine learning model on the relationship between DOC sorption potential and widely-measured climate and geochemical variables. We then used this relationship to estimate the global DOC sorption potential (Qsp) for six soil orders (Alfisols, Entisols, Inceptisols, Mollisols, Spodosols, Ultisols), using data from 14,631 soil profiles distributed globally. Defining the potential contribution of DOC sorption to total soil organic C (SOC) stock is a first step to estimate the potential for soils to accrue stable C and to improve soil C models.

Methods

Compilation of batch experiment data

We compiled a database of published sorption experiments, where DOC was experimentally adsorbed to mineral soils using standard batch experiments (Feng et al. 2014; Jagadamma et al. 2012; Kaiser et al. 1996; Kothawala et al. 2009; Mayes et al. 2012). We selected these experiments based on the similarity of their methodology, as the methods of Feng et al. (2014) and Jagadamma et al. (2012) are based on Mayes et al. (2012), which is based on Kothawala et al. (2009), which is based on Kaiser et al. (1996). Thus, this database is not the result of meta-analysis, but rather represents a limited group of sorption experiments with similar methods. Soils were not fractionated but it is assumed that sorption to minerals occurs in the mineral fraction. DOC was added to mineral soil and allowed to equilibrate for 18–48 h, filtered, and analyzed for total C concentration before and after adsorption. To develop a relationship between the additional sorption potential (Qsp), equilibrium constant (k), and soil geochemical variables, we used five studies with a total of 402 samples from 111 locations in North America and Europe, across six soil orders (Alfisols, Entisols, Inceptisols, Mollisols, Spodosols, Ultisols) representing 62% of the Earth’s ice-free land surface (Feng et al. 2014; Jagadamma et al. 2012; Kaiser et al. 1996; Kothawala et al. 2009; Mayes et al. 2012). Samples from other soil orders were present in the original studies, but we included in this analysis only the soil orders for which there were more than 10 samples, Alfisols (N = 109), Entisols (N = 13), Inceptisols (N = 87), Mollisols (N = 71), Spodosols (N = 30), and Ultisols (N = 66). We also obtained reported Qsp and k estimates from published literature as well as raw sorption data from the studies. We used published values where available (N = 275) and refitted values (N = 127) where only raw data were available. We refitted the available isotherm data to the Langmuir Eq. (1) (Kothawala et al. 2009; Mayes et al. 2012). Some studies adjusted for the amount of DOC released into solution when the initial DOC concentration is 0 mg/L, b from Eq. (2), measured using solution blanks, before fitting the model without b (Jagadamma et al. 2012; Mayes et al. 2012). Other studies fitted or explicitly reported b. Where b was fitted in the original study, we refitted the Langmuir equation in the same way, estimating Qsp, k, and b. We summarize reported estimates of b (N = 186) in the Results section, but we did not attempt to extrapolate this quantity for several reasons. First, this value is generally used to control for DOC release from native soil occurring at 0 mg/L. The DOC released from native soil under these conditions may not have been protected by sorption and therefore does not necessarily represent the desorption potential. Second, soil C models that use the Langmuir equation to represent sorption do not typically include this parameter. Lastly, preliminary analysis with a Random Forest model found that of the predictors in the laboratory experiment dataset b was primarily related to the organic carbon concentration rather than climatic or geochemical predictors.

The parameters Qsp and k were fitted using non-linear regression in R (package drc; Ritz et al. 2015). Across the five studies, 140 of the reported Qsp estimates and 86 of the reported k estimates had corresponding raw data that we were able to fit. Refitted values were correlated with those reported in the literature with minimal bias for Qsp (slope = 0.81, R2 = 0.74, P < 0.05) and k (slope = 0.96, R2 = 0.83, P < 0.05; Figure S1). Differences between reported and refitted values are likely due to differences in the method used for curve fitting. During refitting, we identified and removed one outlier that was greater than three standard deviations from the mean.

In all studies but one, DOC was extracted from the soil organic horizon, leaf litter, or from stream water, while in Jagadamma et al. (2012), DOC solutions prepared using five C compounds were reacted with each sample: glucose, l-alanine, oxalic acid, salicylic acid, and sinapyl alcohol (Jagadamma et al. 2012). For our database, we averaged the parameters derived from each compound for each sample.

Additional soil characteristics that were measured for > 90% of the samples collected were percent clay, pH (CaCl2), dithionite-extractable iron (Fe; mg/kg), and organic carbon content (mg/kg). A subset of the samples measured exchangeable calcium (Ca2+; cmol + /kg), oxalate- and pyrophosphate-extractable Fe, as well as dithionite-, oxalate-, and pyrophosphate-extractable aluminum (Al). Dithionite extraction was developed to extract Fe oxides because dithionite can reduce Fe3+ to Fe2+ but cannot dissolve Al oxides, and thus, any interpretation of Al dithionite extraction should consider that unlike the oxalate extraction, dithionite extraction was not developed to measure Al (McKeague and Day 1966).

We extracted mean annual temperature (MAT) and mean annual precipitation (MAP) for 1979–2000 from WorldClim Version 2 (Fick and Hijmans 2017) at 30-s resolution for each location where soil was collected for batch experiments. Some batch experiments directly reported MAT and MAP at the locations where soil was collected. Reported values for MAT were correlated with extracted values with minimal bias (slope = 0.94, R2 = 0.89, N = 133), while reported values for MAP were slightly lower than extracted values (slope = 0.72, R2 = 0.65, N = 131).

Data analysis

Data interpolation for missing values

Many samples did not have measurements for every variable (0–86% missing; Figure S2), but many of the variables where measurements were not reported by the experiments were correlated with related variables where measurements were more complete (e.g., different types of metal extractions, different texture measurements; Figure S3). We estimated missing data using multiple imputation by chained equations. Missing data were assumed to have a multivariate normal distribution. Although a multivariate normal distribution may predict some values below zero, all of the interpolated values were positive so we did not alter this assumption. Some of the measured values used for imputation were greater than three standard deviations from the mean, but because all were reasonable values for the measurements they represented, we did not remove or transform any data as outliers. Multiple values were drawn from this distribution for each missing datum by Markov-Chain Monte Carlo, conditional on the other data in the data set. We used the classification and regression tree (CART) method to estimate the missing value from the available data (Van Buuren and Groothuis-Oudshoorn 2011). The distributions of the estimated values were reasonably well-matched to the distributions of the measured values, though some of the most common values were over-represented (Figure S4). For the predictors carried forward into the global-scale analysis (i.e., percent clay, pH, and soil order), only 6% of data were missing.

Training for machine learning

We trained Random Forest machine-learning algorithms (Liaw and Wiener 2002) to quantify the relative importance of different climate and soil characteristics in our observational dataset for predicting Qsp and k for measured samples (Fig. 1: Exploration). Then, we built another Random Forest model of Qsp and k using a subset of predictors (MAT, MAP, percent clay, pH, and soil order) which are available at global scales (Fig. 1: Training). For each Random Forest model, we log-transformed Qsp and k to improve model performance. Model performance was evaluated using R2 from permutation cross-validation, where the model is trained on 80% of the data and tested on the remaining 20%, and this procedure is repeated 99 times (package ‘rfUtilities’; Evans and Murphy 2018). Importance is defined as the mean increase in node purity (a measure minimizing the homogeneity of classes or labels) when the predictor is used to split regression trees in the model. We grew 500 trees for each model, and the number of variables tried at each split was set at the largest integer smaller than or equal to the number of predictors divided by three (Breiman 2001). We tested other values of the number of variables to try at each node, but the method above provided the best model fit. We used the ‘forestFloor’ package (Welling et al. 2016) in R Statistical Language to derive partial feature contributions for the predictors included in the Qsp and k Random Forest regression models. Partial feature contributions show the partial response of the dependent variable to changes in the independent variable across its range.

Fig. 1
figure 1

Diagram of analyses performed. First, 17 predictors from the batch sorption experiments were used to explore the relationship between sorption parameters (Qsp and k) and environmental variables. Then, 5 predictors available at global scales were used to train a Random Forest model for global estimation of Qsp and k. Next, soil profile-level predictors from WoSIS and gridded predictors from SoilGrids were used to predict Qsp and k. Lastly, we estimated the uncertainty due to model and predictor error by creating 20 new predictions using alternate datasets and models

Estimating the DOC sorption potential at global scales

We used the model generated in the training step to estimate the spatial distribution of Qsp and k at global scales using climatic and geochemical predictors from profile and gridded data in Table 1 (Fig. 1: Prediction). We estimated Qsp (mg/kg) and k (L/mg) for 14,631 soil profiles from the World Soil Information Service (WoSIS) database (Batjes et al. 2017) (Table 1). We analyzed the whole soil profile from surface to 1 m depth, and repeated the analysis for 13,609 topsoil profiles (0–30 cm) and 13,993 subsoil profiles (30 cm–1 m) that reported depth-specific clay and pH values.

Table 1 Description of profile and gridded data used for estimating Qsp at global scales and alternate gridded data for uncertainty quantification of global totals of Qsp

Both Qsp and k were mapped globally using 10 km gridded fields of MAT and MAP, percentage clay, pH, soil order, bulk density, and SOC stock to 1 m (Table 1; Hengl et al. 2017). For each 10 km grid cell, Qsp and k were estimated in the same way as described above for WoSIS profiles. To estimate a global Qsp in Pg carbon for 6 soil orders, we converted Qsp from concentrations to stocks using the bulk density and volume of coarse fragments following (Hengl et al. 2017; Tifafi et al. 2018) for each grid cell.

Estimating uncertainty

We estimated the uncertainty in the global-scale Qsp by accounting for predictor error and model error (Fig. 1: Uncertainty). To estimate predictor error we used four additional gridded predictor datasets (Alternate MAT, MAP, %Clay and pH; Table 1). We substituted each alternate dataset for the standard predictor dataset to generate four additional predictions. To estimate model error, we generated 5 Random Forest models from fivefold cross validation. The overall error is estimated from the standard deviation of the Qsp estimate from the 20 (4 datasets × 5 models) additional predictions.

Results

Influential variables controlling sorption in batch experiments

Our Random Forest models (Liaw and Wiener 2002) were used to identify the key climate and geochemical variables that influence Qsp per unit soil mass (mg/kg) from the laboratory experiment measurements (Fig. 1: Exploration). According to this model, the most influential variables were dithionite-extractable Fe, mean annual temperature (MAT) and clay percentage (R2 = 0.46, Fig. 2a). The equilibrium constant k (L/mg) was most strongly related to organic carbon content, pH, MAT, soil order, and mean annual precipitation (MAP) (R2 = 0.32, Fig. 2b).

Fig. 2
figure 2

The mean increase in node purity when the predictor variable is used to split regression trees in the model for 17 predictors of a additional sorption potential (Qsp) in mg/kg and b the equilibrium constant (k) in L/mg. The most influential variables are shown from the top to the bottom of the graph

For global estimation of Qsp, we built a Random Forest model with only the five predictors in Fig. 2 that were available at global scales (Fig. 1: Training). Percent clay had the highest mean increase in node purity for the globally-applied Qsp model (R2 = 0.38, Table S1), with a positive relationship between Qsp and percent clay (Fig. 3a). There were also positive, mostly monotonic relationships between Qsp and MAT, and between Qsp and MAP. Qsp had a threshold relationship with pH, with a low Qsp in very acidic soils (pH < 4) and no relationship above that threshold (Fig. 3a). Soil order influenced Qsp, with Alfisols, Mollisols, and Ultisols tending to have higher Qsp relative to the other soil orders (Fig. 3a). The most important parameter for estimating k, measured by mean increase in node purity, was pH (R2 = 0.32, Table S1), which also had a threshold effect with low k in very acidic soils (pH < 4) and a negative relationship between k and pH at pH > 4 (Fig. 3b). There were non-monotonic relationships between k and MAT, MAP, and percent clay (Fig. 3b). Of the six soil orders, Spodosols and Ultisols were most likely to have higher values of k (Fig. 3b).

Fig. 3 
figure 3

Partial feature contributions of percent clay, pH, Soil Order, mean annual temperature (MAT), and mean annual precipitation (MAP) for predicting a the additional sorption potential (Qsp) and b the equilibrium constant (k). Symbol colors reflect the position on the x-axis of the most important variable, a Clay (%) and b pH, respectively. For example in (a), sites with values of pH 3 are mostly red symbols, indicating that they also have low values of Clay

We could not use a measure of reactive or extractable Fe and Al to predict Qsp at global scales because of the lack of sufficient gridded or multi-site data. Nevertheless, dithionite-extractable Fe was the most important predictor of Qsp in our Random Forest model based on the sorption batch experiment data (Fig. 2a). Although many Fe and Al measurements in measured profiles were missing and had to be imputed, the correlations between extractable metals and Qsp were more strongly positive than any other single predictor in the dataset (Figure S3). This finding, and the wealth of research about the importance of extractable Fe and Al for predicting C storage and mineral stabilization (Kaiser and Guggenberger 2000; Kalbitz and Kaiser 2008; Kramer and Chadwick 2019; Rasmussen et al. 2007; Schrumpf et al. 2013; Torn et al. 1997), suggests a strong need for a global dataset of those extractable metals.

The addition of other soil properties to the Random Forest model of Qsp, especially extractable Fe and Al content, increased the variance explained by 8%. It is possible that adding other soil properties such as C, N, and other nutrient contents, bulk density, cation exchange capacity, base saturation, and soil moisture would increase the variance explained for the sorption potential. However, these variables were either not available or mostly missing in the training dataset, and would have had to have been estimated from map-based products, which may well represent the average value at large scales but not field-scale measurements. Soils in particular are very heterogeneous, and some regionally or globally-applied models of soil organic C concentration (R2 = 0.23 in Hengl et al. 2014, improved to R2 = 0.64 in Hengl et al. 2017), SOC stock (R2 = 0.54 in Sanderman et al. 2017), SOC stock without land use (R2 = 0.34 in Sanderman et al. 2017), or dissolved organic C (R2 = 0.36 in Langeveld et al. 2020) explain less variance than models more related to plant processes such as mycorrhizal fungi type (R2 \(\approx\) 0.5–0.8 in Steidinger et al. 2019) or gross primary production (R2 = 0.7 in Tramontana et al. 2015). These limitations may be overcome with additional soil property measurements at the level where other measurements are collected, or a better understanding of soil spatial heterogeneity.

The amount of DOC released into solution when the initial DOC concentration is 0 mg/L, b from Eq. (2), has a median value of 0.08 (0.01, 0.77) g C kg soil−1 and is positively correlated with organic carbon content (\(\rho\) = 0.69), which is an order of magnitude lower than the amount that can be added to soil (i.e., Qsp). Since this value is related to the amount of total C, it may include some desorbed C but may also include C released from particulate organic matter such as litter.

Influential variables controlling global-scale patterns of sorption

A map of carbon storage potential (Qsp) concentration in g C kg soil−1 was generated by applying the global-scale Random Forest model to predictors of Qsp for 14,631 soil profiles from the World Soil Information Service (WoSIS) database (Fig. 4a; Batjes et al. 2017), and for each grid cell of the globe using the SoilGrids data products (Fig. 4b; Hengl et al. 2017). Across a global range of soil profiles (N = 14,631), the median potential of additional C storage from DOC sorption, Qsp, was 1.1 g C kg soil−1 (0.43 to 1.9 for the 95% Confidence Intervals; CI). DOC sorption potential values were summed to estimate a global additional sorption potential of 107 \(\pm\) 13 Pg C across 6 soil orders to 1 m depth, that is, a 7% increase in the current global SOC stock of these soil orders (1615 Pg; Table 2).

Fig. 4
figure 4

a, b Additional sorption potential (Qsp) and d, e equilibrium constant (k) across a global range of soil profiles (N = 14,631) representing 6 soil orders. In a, d median values from each profile are binned hexagonally, and in b, e values are estimated using the SoilGrids gridded global data products. c The additional sorption potential (Qsp) by latitude, estimated using SoilGrids gridded global data products. The black line is the median Qsp and the gray shading represents the 5th and 95th percentile range

Table 2 Summary of estimated quantities per volume of soil, the number of profiles used to estimate the quantity, the depth increment, and the global estimate for 6 soil orders

The additional DOC sorption potential Qsp was highest in low- and mid-latitude soils with high clay content, and lowest in high-latitude soils dominated by organic matter (Fig. 4a–c). We found that high Qsp values prevail in parts of eastern North America, the Amazon, central Africa, and Indonesia. These areas have a high percent clay, higher MAT, and moderately acidic pH (Figure S5a–c). High k occurred in areas where pH was between 4 and 5 (Figs. 3b, 4d, e).

Across the individual WoSIS profiles, Alfisols, Mollisols, and Ultisols had higher average Qsp relative to other soil orders (Figure S6a). In the same profiles, Spodosols and Ultisols were estimated to have higher average values of k (Figure S6b). Although we excluded Gelisols and Histosols from this analysis because they were poorly represented in the laboratory sorption experiments, high-latitude profiles from other soil orders were estimated to have consistently low Qsp, implying that most mineral sites are already occupied by organic C. Treating grid cells as independent, there is a weak, negative correlation (\(\rho\) = 0.27) between Qsp in units of stock and total organic carbon stock (kg m−2), suggesting that soils with high initial organic C stocks will have less capacity to accrue more C via sorption. The negative influence of organic matter on sorption of various chemical species has been observed in previous sorption experiments (Johnson and Todd 1983; Klotzbücher et al. 2020; Redman et al. 2002).

We subset the soil profiles from the WoSIS database to look at the difference in additional DOC sorption potential between two soil depths representing topsoils (0–30 cm, N = 13,609) and subsoils (30–100 cm, N = 13,993). We found that Qsp was higher in the subsoil (Figure S7a), while k was not different between topsoil and subsoil (Figure S7b). Subsoils therefore are found to have a greater capacity to sorb additional DOC, likely because mineral concentrations and therefore available mineral surface area increase with depth, while soil C concentrations decrease with depth (Figure S7c). We did not have sufficient data to extend our analysis to soils deeper than 1 m, but acknowledge there is substantial C stored at depths > 1 m that, like soils at 30–100 cm, are likely to have higher mineral surface area relative to topsoils (Batjes et al. 1996; Jobbágy and Jackson 2000).

Discussion

This study provides an estimate of the potential additional C that could be sorbed to minerals in soils spanning six soil orders and a range of climate conditions globally. Our best estimate of 107 Pg C should be interpreted as the maximum of additional C that can potentially be sorbed as DOC. Our estimated sorption potential for six soil orders is similar in magnitude to the 116 Pg estimated SOC stock lost from human agricultural activity over the last 12,000 years (Sanderman et al. 2017). Although we do not know the land use history of our soils, and disturbed soil may have a different DOC sorption potential compared to undisturbed soil, this result suggests that it may be possible to compensate at least partially the loss of SOC due to soil degradation on a global scale, from the potential of soil to sorb more DOC. The inferred 7% increase in SOC stocks shows that if the DOC sorption potential could be realized for all soils, disturbed and undisturbed, it would sustain the ‘4 per 1000’ goal for 12 years.

Theoretically, some increase in soil C sequestration may be realistically achieved by increasing DOC concentration in the soil. This could be accomplished by increasing plant inputs, reducing DOC consumption by microorganisms, and changing soil moisture and water flow which affect local DOC concentration and leaching. The latter options are more complicated to manipulate, whereas land management options that increase C inputs into the soil are well-established; e.g., cover-crops, organic fertilizer amendments, moderate grazing or agroforestry (Cardinael et al. 2018; Maillard et al. 2017; Poeplau and Don 2015). These options may be particularly useful in the croplands of eastern North America and managed or converted forests of the tropics, which have high additional sorption potential (Fig. 4a, b). It would also be possible to increase soil C sequestration by changing the environmental conditions known to affect k, such as pH, keeping in mind that making modifications to pH may affect many other aspects of soil quality. Although beyond the scope of this study, the chemical composition of DOC (Jagadamma et al. 2012) and seasonality of inputs may also affect the sequestration potential. Nevertheless, DOC is generally highly reactive if not sorbed and any change in environmental conditions inducing DOC desorption may drastically increase soil CO2 emissions.

We do not argue that the sorption potential estimated in this study is fully realizable. In fact, the DOC sorption potential in field conditions is likely lower than that measured and extrapolated from batch experiments. Batch experiments, like most soil laboratory experiments, homogenize the soil before adding DOC, which disrupts soil structure and preferential flow paths that would limit the access of soil minerals to DOC in situ (Kaiser and Guggenberger 2005). Therefore, the sorption potential that we estimate here is a maximum assuming that mineral sites in soil are equally accessible to DOC, a condition which is rarely satisfied in the field.

On the other hand, batch sorption experiments may underestimate the C sequestration potential of DOC because they only consider DOC sorption which occurs on a fast timescale. Batch sorption experiments are brought to equilibrium over a period of 24–48 h. This equilibration is sufficient to associate DOC with soil minerals under saturated conditions but does not account for microbial turnover or structural changes, such as aggregation, that occur over longer periods of time. Also, soils contain a wide range of forms of organic matter, interacting with minerals along different pathways (Kaiser and Guggenberger 2007; Masiello et al. 2004; Mikutta et al. 2011; Weng et al. 2017). For example, microbial growth, consumption of DOC, and subsequent turnover into necromass can contribute to mineral-associated C as well as total SOC (Kallenbach et al. 2016). Therefore, a larger increase of SOC than that observed in the batch sorption experiments might be possible due to mechanisms that operate over longer timescales.

Though DOC sorption is a fast process, the rate of C sequestration by this mechanism is still limited by the amount of nitrogen and phosphorus required to sequester C as soil organic matter. Recent discussions have pointed out that it would require 593 Tg N year−1 (assuming a soil C:N of 15) and 35–75 Tg P year−1 to achieve the ‘4 per 1000’ goal of sequestering 8.9 Pg C year−1, the rate of anthropogenic emissions (Davies et al. 2020; Spohn 2020). The amount of N required is greater than global natural and anthropogenic nitrogen fixation combined, 413 Tg N year−1 (Fowler et al. 2013), and the amount of phosphorus required is a substantial proportion of global phosphate rock mined each year, 240 Tg P year−1 (USGS 2020). As a result, soil C sequestration that draws on coupled nutrients cycles is likely only achievable at a rate well below that of anthropogenic emissions.

Our results also have implications for models. Soil decomposition models that use Langmuir sorption to represent all interactions in the mineral fraction typically have k constants orders of magnitude higher, e.g., 2.5 \(\times\) 10–1 (Abramoff et al. 2018; Robertson et al. 2019) and 9 \(\times\) 103 m3/g C (Ahrens et al. 2015), than the values estimated from the 402 batch sorption experiments in this study (10–4 to 10–2 m3/g C), suggesting that processes other than sorption contribute to C accumulation in the mineral fraction (Abramoff et al. 2018; Wang et al. 2013) and that models may over-tune sorption as a process to create C storage. Mineral-associated C is itself operationally-defined, with many potential mechanisms of physical and chemical fractionation used (Poeplau et al. 2018). As a result, mineral-associated C may not only form upon direct sorption but also coprecipitation (Mikutta et al. 2011), organic-organic interactions (Kleber et al. 2007), or occlusion within aggregates (Rasmussen et al. 2005; Six and Paustian 2014). These additional mechanisms can reduce the amount of mineral-associated C that is in contact with aqueous DOC, and therefore reduce the amount that can be readily exchanged (Kleber et al. 2007; Zhuang et al. 2008). Models are emerging that attempt to partition these varied processes into categories that can be more directly measured (Abramoff et al. 2018; Robertson et al. 2019).

According to our results, a greater proportion of soil C can be sorbed to minerals in highly weathered soils, (e.g., Ultisols), relative to less weathered soils (e.g., Inceptisols; Figure S6a). Presumably this is due to the increased prevalence of mineral surfaces per volume of soil in highly weathered soils relative to less weathered soils. However, we did not consider how the types of clay minerals found in soils with different weathering status may affect our predictions. For example, clay minerals commonly found in highly weathered soils (e.g., kaolinite) can have low reactivity (Doetterl et al. 2018) which may reduce their effective sorption potential per volume of soil relative to less weathered soils. Of the soil orders not included in this analysis, some are expected to have relatively high sorption potential, because they are characterized by the presence of secondary minerals (Andisols and Oxisols), or by high clay content (Vertisols). Others we would hypothesize to have low sorption potential due to environmental limitations on weathering (Aridisols), or strong environmental controls on C storage (Gelisols, Histosols). Future work to expand measurements in these soil orders could answer questions about the role of weathering and clay minerals while also contributing to a complete estimate of the global sorption potential.

Our study points to important gaps in our understanding of stabilized organic C. How much of the theoretical potential of stable C storage is realizable? For example, Sanderman et al. (2017) found that human agricultural activity has created a C deficit of 116 Pg globally, but suggests that only part of this amount is recoverable through management. What is the importance of sorption relative to other mechanisms for soils under different climatic or management regimes? Resolving these knowledge gaps will be necessary if soils are to contribute to the resolution of the global C imbalance due to burning of fossil fuels.