Introduction

Most of the factors related to spatial variation in soil organic carbon (SOC) stocks have been known for some time, as has the reality that their relative influences vary across scales (Wiesmeier et al. 2019). From molecular structures and particle sizes factors at pore to ped scales (Sollins et al. 2006; von Lutzow et al. 2006), to topography and moisture at pedon to landscape scales (Doetterl et al. 2016; Adhikari et al. 2020), to climate and vegetation at regional to global scales (Jobbagy and Jackson 2000; Post et al. 1982), a large body of research readily explains why SOC varies so remarkably from place to place. The strongest studies also acknowledge that soils are dynamic through time, unique at sites and scales that cannot be captured by even strong generalizations, and therefore conclude that further investigations into factors influencing SOC storage will continue to refine our understanding of how much is present, where, and why.

In the body of research exploring patterns and predictors of variation in SOC storage, a vast number of studies have reported factors influencing SOC for part of the spectrum of spatial scales (e.g., Davidson 1995; Goidts et al. 2009; Huang et al. 2017; Minasny et al. 2013; Mishra et al. 2010; Patton et al. 2019a; Paustian et al. 1997; Schimel et al. 1994; Thompson and Kolka 2005; Wynn et al. 2006). The ability to make inferences across the full range of spatial scales from global to landscape has largely derived from reviews of this literature, such as Wiesmeier et al. (2019, and references therein), and from multi-site inventories or large data syntheses that have used extensively (but not intensively) distributed observations to assess patterns at regional or larger scales (e.g., Doetterl et al. 2016; Cotrufo et al. 2019). In the context of these approaches to addressing SOC stocks and predictors as a function of scale, large networks that allow for investigation of patterns across and within sites have much to add, especially when such networks are sampled expeditiously and according to common protocols.

Studies that explore processes of SOC and soil organic matter (SOM) stabilization report mechanisms that may relate to patterns of SOC storage. This literature has particularly focused on soil physicochemical and biogeochemical mechanisms promoting SOC stability, such as physical protection, mineral or metal association, and molecular complexity (Crow et al. 2007; Kallenbach et al. 2016; Kramer and Chadwick 2018; Mao et al. 2000; Preston and Schmidt 2006; Sollins et al. 2006; Six et al. 2002; von Lutzow et al. 2006). These and other studies have specifically pointed to extractable metals (e.g., Fe, Al), exchangeable base cations (e.g., Ca, Mg), and soil fine fraction contents (especially clays) as having controlling influence over the stability of SOC (Chen et al. 2019; Heckman et al. 2018a, b; Lawrence et al. 2015; Rasmussen et al. 2018). If these processes and mechanisms that confer SOM persistence also result in larger quantities of SOC being present at a point in time, they may provide a foundation for hypotheses addressing physicochemical predictors of variation in SOC stocks at varying scales.

The literature on pedogenesis and soil taxonomy also offers a strong foundation for assessments of SOC patterns and predictors at distinct scales. Genetic soil taxonomy uses morphologic and physicochemical properties to infer processes of soil formation, many of which involve gains, losses, transfers and transformations of materials which are dominantly comprised of, or critically affected by C (Marbut 1921; Simonson 1959). In light of the many interacting processes and factors in soils, all of which vary continuously, genetic soil taxonomy provides a structure for managing this complexity, creating categorical groups that reflect fundamental differences between soils. This categorical system is also hierarchical, integrating broad distal (e.g., climatic) and local proximal factors (e.g., physicochemical properties) at steadily increasing resolution, to describe soils ultimately as unique bodies, which often differ in SOC stocks (Wills et al. 2013). As soils are inherently multi-factor, these distal and proximal factors are not completely independent of each other. Nonetheless, this pedogenic framework provides structure for a conceptual model (Fig. 1) that can be applied at any number of scales. In this model, distal and proximal controls mediate each other, with distal controls dominant in extreme climates (frozen, saturated, or arid conditions) and proximal controls becoming more important in the mesic, temperate middle, and within landscapes where climatic variation is narrower.

Fig. 1
figure 1

Conceptual model relating proximal versus distal influences on SOC storage across ecologic and pedologic scales of investigation. At broad scales and where climate is extreme, distal climatic factors have greater influence on SOC stocks. At more localized scales and in moderate climates, proximal variation (e.g., in physicochemical properties) is more important

The present study is based upon this conceptual model, which seeks to rectify the more proximal factors recognized in the mechanistic biogeochemical literature with the distal climatic factors long associated with more ecological analyses of SOC. This study is enabled by the continental-scale National Ecological Observatory Network (NEON); as such it is intended to provide context for studies at and across NEON sites, and to test hypotheses related to SOC stocks and their variation as influenced by scale and study design. Data for testing these hypotheses derive from two complementary campaigns, in which one sampled profiles from closely adjacent intensive plots and physically composited similar horizons, and the other sampled dozens of pedons across the landscape at each site. Owing to the differing levels of replication of these two campaigns, they afford opportunities to assess SOC stocks across the entire network and within sites, i.e., at continental and landscape scales. Our (6) hypotheses, enumerated below, are informed by literature reporting predictors of SOC storage across scales, and by the SOM stabilization literature, though it is important to note that they address SOC stocks in terms of patterns, not stabilization as a mechanistic process. (1) Regarding whole soil profiles, we hypothesized that the two designs reveal the same continental-scale patterns and sources of variation in terms of soil taxonomy, climate and soil wetness influences on SOC stocks. (2) We further hypothesized that profile SOC stocks vary according to soil wetness within landscapes in the same pattern as at the continental scale. (3) Regarding discrete genetic horizons, we hypothesized that A horizon stocks are predicted more by land cover/use than parent material, with the reverse being true for B horizons, and the pattern being consistent for the two study designs. (4) Regarding illuvial horizons specifically, we hypothesized that within-site replication reveals a wider suite of physicochemical variables influencing SOC stocks in B horizons, even at the continental scale, and (5) that the physicochemical predictors of B horizon SOC differ between distinct pedogenic pathways. (6) Lastly, we hypothesized that, within landscapes consisting of soils forming along distinct pedogenic pathways, the predictors of B horizon SOC stocks are the same, and follow the same patterns as observed at the continental scale.

Methods

Study sites and data sources

The NEON design includes 47 terrestrial sites, distributed across 20 the conterminous United States (CONUS), Alaska, Hawaii, and Puerto Rico. This study utilizes soil observations from the 40 CONUS and Alaska NEON sites comprising the Soil Organic Matter—Mechanisms Of Stabilization (SOM-MOS) project (Heckman et al., this issue). In this paper, we present results from two complementary sampling campaigns at these 40 NEON sites. The first is the SOM-MOS project, which sampled five profiles in close association with the centrally located eddy-covariance tower at each NEON site, on the locally dominant soil map unit. The SOM-MOS project conducted standard soil characterization tests and complemented them with a suite of mechanistic measurements, including density fractionation, radiocarbon, laboratory incubations, and a wide range of spectroscopic analyses. The present analysis does not address mechanisms and utilized only the standard soil characterization data from SOM-MOS. The second campaign is the network-wide, NEON-coordinated sampling of dozens of soil profiles collected across the landscape at each site, intended to characterize standard soil properties (the initial characterization campaign). The former of these campaigns is referred to hereafter as the “non-replicated” or SOM-MOS dataset, to distinguish its limited, highly localized within-site replication as compared to the “replicated” or initial characterization dataset.

SOM-MOS soil sampling and data synthesis

Samples were collected for the SOM-MOS project from each of the five centrally located Soil Array Plots immediately surrounding the eddy-covariance tower at each NEON site between February 2015 and October 2018. At each site, NEON staff used a hydraulic corer (7.6 cm diameter for permafrost, 4.5 cm diameter for all others) to sample in 1 m increments to a depth of refusal by rock, 2 m (most soils), or 3 m (when possible in permafrost soils). Cores, capped and contained in clear butyrate plastic liners, were shipped on ice to the Oregon State University Core Marine Geological Repository Laboratory, where they were stored up to 8 weeks at 4 °C until processing and description, typically within 2 weeks of arrival, according to U.S. Department of Agriculture-Natural Resources Conservation Service (USDA-NRCS) protocols (Schoeneberger et al. 2012). During processing, the five SOM-MOS profiles per NEON site were described individually, and identical genetic horizons were composited across those cores possessing them. Individual composited horizon samples were homogenized, split, and shared among collaborating facilities where characterization was completed according to NRCS methods (Burt et al. 2004) as detailed in Heckman et al. (this issue). We addressed variation in the designations and thicknesses of major horizons across cores by expressing total profile SOC stocks mathematically as site-level “composite profiles” based on the number of cores in which a given horizon was observed and the thicknesses of that horizon among those cores in which it was observed. We averaged the thicknesses of similar and composited horizons across cores, where the thicknesses of horizons that were absent in a core equaled zero. Horizons from all cores were reassembled into a master chronology and the computed thicknesses were then used to calculate depths for each horizon. Bulk density was measured on every horizon from each core, and a weighted average based on thickness was used to calculate bulk density of the composite horizon. Of the individual horizons ultimately present in each site-level composite profile, only the uppermost mineral (typically A), uppermost B, and lowermost B horizons were subjected to full laboratory characterization, e.g., C concentrations, extractable ion concentrations, particle size distributions. For purposes of whole profile SOC stock calculations, organic C concentrations for non-characterized horizons were modeled from diffuse reflectance Fourier transform mid-infrared (FT-IR) spectra of those horizons, according to methods described in detail elsewhere (Dangal et al. 2019). Briefly, a local modeling approach called Memory Based Learning (Ramirez-Lopez et al. 2013) was applied using the USDA-NRCS Kellogg Soil Survey Laboratory (KSSL) spectral library, consisting of > 50,000 soil samples with FT-IR spectra and measured OC data. The FT-IR based predictions showed excellent performance for the subset of SOM-MOS samples that were independently measured for OC concentration (R2 = 0.99, n = 87). Of the total 289 OC concentrations in the SOM-MOS dataset, 117 were measured by elemental analysis, 121 were predicted from their FT-IR spectra, 28 were the linear interpolation of values for super- and supra-adjacent horizons, and 23 were assumed to be equal to the value reported for that site and most similar horizon in the NEON Megapit soil characterization data product (National Ecological Observatory Network (NEON) 2020; DP1.00097.001). We computed SOC stocks of each SOM-MOS horizon as % C × Db × horizon thickness, scaled to Mg C ha−1, and truncated the OC stocks of horizons spanning 100 cm to a depth of 100 cm.

NEON initial characterization soil sampling and data synthesis

At each NEON site, 10–26 soil profiles were observed, described, and quantitatively sampled by professional soil survey staff from NRCS according to standard field methods (Schoeneberger et al. 2012). Sampling took place between September 2015 and August 2018, with sampling at most sites typically occurring within a one-week period. At each site, profiles were sampled from a subset of the Distributed and Tower Base Plots, which are collectively arrayed across a landscape of hundreds to thousands of hectares surrounding each NEON site’s centrally located eddy-covariance tower. Soil survey staff with local expertise selected a subset of these plots at each NEON site for sampling, with the intent of capturing the range of variability in dominant soil map units within the footprint of the overall NEON site. Most profiles were excavated as pits (i.e., full pedons); at some sites or under specific soil conditions (e.g., deep dry sands or saturated organic wetland soils), augers were used to observe, describe, and collect samples. After collection, samples were shipped on ice to the KSSL in Lincoln, NE, where they were processed and characterized according to standard protocols (Burt et al. 2004). These characterization data, as well as descriptive information such as site and pedon descriptions are available from NEON at https://www.neonscience.org/data-collection/soils-sediments.

We began our analysis of NEON Distributed Plot soils with data for 2627 individual soil horizons from 615 profiles across 40 NEON sites, as sampled and characterized by NRCS. Soil C concentrations were determined as percent total C at the KSSL using an elemental analyzer; for soils containing measureable inorganic C concentrations, we subtracted the percent inorganic C (reported as % of mass in CaCO3 equivalents) from the total C to yield a computed % organic C value. We assumed that % total C = % organic C for soils that did not report a % inorganic C value (n = 565); of these, 491 were pH < 7.0. We calculated SOC stocks using C concentrations and bulk density (Db) values, a majority of which were measured by the clod method (n = 1213; Burt et al. 2004) or as oven-dry soil mass divided by soil volume at field moisture content (n = 220). We created a structured approach to gap-filling the 1194 missing Db values that emphasized measurements most closely aligned with the fine earth fraction. In order of preference and limited by data availability, we gap-filled Db data using the (1) mean of the clod measurements for the other samples with an identical horizon designation from that NEON site (n = 695), (2) mean of that master horizon across the entire dataset of measured clod values (n = 166), (3) site-level estimate for that horizon based on the most closely matched SOM-MOS (n = 169) or NEON Megapit (n = 103) samples, or (4) mean of the field-collected samples with the most similar horizon designations (n = 61). Although measurements are preferred, even carefully measured Db values are still only estimates of a notoriously variable soil property (Patton et al. 2019a, b; Throop et al. 2012). Our gap-filling approach (using the average or point estimate of measured values for similar horizon types) avoids problems of non-independence arising when one soil property of interest (Db) is predicted from another (% C) via a pedotransfer function; considering the distribution of gap-filled values across the entire Distributed Plot dataset, it is unlikely to result in any directional bias in our SOC stock calculations. Furthermore, we conducted a critical statistical appraisal of gap-filled estimates generated from pedotransfer functions (% C vs. Db) vs. horizon means, which revealed no benefit of pedotransfer functions in terms of prediction intervals, and consistently significant differences between discrete types of horizons for the horizon mean approach. Follow the same calculations as with the SOM-MOS dataset, we computed SOC stocks of each sampled horizon in the Distributed Plot dataset as % C × Db × horizon thickness, scaled to Mg C ha−1, and truncated the OC stocks of horizons spanning 100 cm to a depth of 100 cm, assuming a homogenous vertical distribution of OC within such horizons.

Data analysis

As is typical in soils datasets, SOC stocks of whole profiles and individual horizons were for typically skewed right in the SOM-MOS and initial characterization datasets; we used ln transformation to address non-normality (Grigal et al. 1991). We ran parametric statistical tests (categorical: ANOVA with Fisher’s Least Significant Difference multiple comparisons, t test; continuous: best subsets, simple or multiple linear regression) on transformed response variables, but present most results as back-transformed means and 95% confidence intervals to aid in interpretation. In some cases, we used non-parametric tests (Kruskal–Wallis or Mann–Whitney) and report results as medians and quartiles. All statistical tests were performed with SigmaPlot (SYSTAT Software, San Jose, CA US).

We selected specific statistical analyses to address our hypotheses. To test H1 and H2, we used one-way ANOVAs to test whether soil taxonomic groups (Order, Suborder, Great Group, Subgroup) soil temperature regime, moisture regime, were significant categorical predictors of variation, followed by Fisher’s tests to identify significantly different groups at continental (H1) and landscape (H2) levels. We performed these tests separately for the non-replicated SOM-MOS (one composited profile per NEON site) and replicated initial characterization (many profiles per NEON site) datasets. We ranked the ability of categorical variables to explain variation among the observations by examining: (1) the proportion of total variation among observations that was explained by either variable [sum of squares between groups/total sum of squares]; (2) the F statistic associated with each categorical variable, which controls the comparison of variance for the degrees of freedom associated with each categorical variable. To test H3, we used two one-way ANOVAs (one for A, one for B horizons) to test whether land cover/use and parent material groups differed significantly SOC stocks, again repeating these tests for SOM-MOS vs. initial characterization datasets. To test H4, we used best subsets regressions to identify variables explaining the largest shares of variation in SOC stocks for A and B horizons; similar to other tests, these were conducted separately for the SOM-MOS and initial characterization datasets. Each best subsets model run was allowed to select from a common field of predictor variables, which we selected before running tests based on factors reported in the literature summarized in “Introduction” section. In terms of soil physicochemical properties (derived from soil characterization tests), these included particle size distribution (% of mass in sand, silt, and clay size fractions), volumetric coarse fragment content (% coarse fg.), pH in 2:1 water:soil slurry, oxalate extractable aluminum (Al_ox) and iron (Fe_ox) and dithionite-citrate extractable Al and Fe (Al_dith, Fe_dith) as % of mass, ammonium acetate extractable Ca (Ca_NH4) and CEC (CEC_NH4) contents as cmol + per kg of soil (Burt et al. 2004). In terms of climate variables, these included site-level mean annual temperature (MAT), mean annual precipitation (MAP), and a climatic leaching index calculated as the difference between MAP and annual evaporation (MAP-ERef). All climate variables were generated with the ClimateNA v5.10 software package, at 4 km resolution and using 30-year means (Wang et al. 2016). To characterize land cover and use (not differentiated), we used the categories of the National Land Cover Dataset (NLCD; Homer et al. 2015), as provided by NEON for each plot from which soils were sampled for the two sampling campaigns. We selected the strongest predictive model in each test by inspecting the adjusted R2 values of each successively larger model (in terms of P predictors) and accepted the largest model that increased the adjusted R2 by at least 0.05 relative to the one with P − 1 predictors. This a priori approach protects against over-fitting by selecting models with many added predictor variables that do little to increase proportion of explained variation, and which often appeared to suffer serious multicollinearity or spurious relationships in our datasets. In cases where the selected model had variables with strong multi-collinearity [variance inflation factors (VIF) > 4.0], we defaulted to the next smallest model without any VIF values > 4.0. To test H5, we used the nonparametric Kruskal–Wallis test to assess whether median values of B horizon properties in the initial characterization dataset differed between three pedogenically defined groups at the continental scale, and subsequently the Mann–Whitney test to compare medians (for two of the three pedogenic groups) at each of three NEON sites. To test H6, we used the same best subsets regression process described for H4 three times (once for each NEON site assessed under H5) to identify continuous variables explaining the largest share of variation in B horizon SOC stocks.

In all statistical tests, we set P < 0.05 as the threshold for accepting test results as significant. In addition to assessing statistical significance according to P values, we also assessed whether patterns and statistical significance were robust to alternative models by performing separate tests aimed at each hypothesis. Namely, for hypotheses addressing SOC stocks as a function of categorical variables (H1-H3, H5), we ran nonparametric Kruskal–Wallis and Mann–Whitney tests (as appropriate to the number of groups) on non-transformed SOC stocks for the initial characterization dataset. For hypotheses incorporating both continuous and categorical predictors, we ran best subsets regressions focused on B horizon properties twice: once for the full initial characterization dataset of B horizons and again using only the uppermost B horizon from each initial characterization pedon (most pedons had 1–2 B horizons). The results of these additional tests are not reported in this paper; in general, we found very few (and minor) exceptions to patterns reported in this paper when substituting alternative models. When deviations were noted, they involved minor changes, e.g., in multiple comparisons P values for categorical groups, or the partial t values of individual variables in best subsets models. Overall, we report this here to encourage multi-model analyses as a practice, and emphasize the robustness of our results to alternative analytical frameworks in the remainder of this paper.

Results

Hypothesis 1: patterns and sources of variation in whole profile SOC stocks at the continental scale

At the continental scale, nearly all tested predictor variables were significant sources of variation in whole profile SOC stocks in both the SOM-MOS and initial characterization datasets (Table 1). Among the three climatic predictor variables tested, both datasets indicated that temperature regime explained more of the observed variation in whole profile SOC stocks than drainage class, whether assessed in terms of the proportion of total variance explained (SSb/SSt) or when comparing F statistics to control for the differing degrees of freedom of the three predictor variables. Regarding variation as explained by soil taxonomy, the initial characterization dataset with its high level of within-site replication suggested that finer taxonomic classifications were mostly able to explain larger proportions of total variance in profile SOC stocks. However, comparing F statistics to control for the differing degrees of freedom across taxonomic levels, the coarsest level of classification (soil order) was the strongest predictor. NEON sites also differed significantly from each other in profile total SOC stocks, despite considerable within-site variation in soils present and SOC stocks (Supplementary Tables S1, S2). Assessing between-site differences was not possible with the SOM-MOS dataset, which possessed no within-site replication and also had limited statistical power for testing variation at finer levels of taxonomic classification. Specifically, while SOM-MOS spanned 9 orders (compared to ten for the initial characterization dataset), most suborders, great groups, and subgroups had only n = 1 profile, and highly significant P values reflected large differences between a small number of highly different mean values for certain soil taxa (Fisher’s test comparisons, results not shown).

Table 1 Categorical sources of variation in profile total SOC stocks at the continental scale in the SOM-MOS (left) and initial characterization (right) datasets

The SOM-MOS and initial characterization datasets both spanned the full range of temperature regimes in USDA Soil Taxonomy, and temperature regime explained 70 and 39% of the total variance in profile total SOC (as SSb/SSt) for the two datasets, respectively. Both datasets showed the same general pattern (Fig. 2), with the largest profile SOC stocks at the cold, and the smallest at the warm end of the spectrum, and many though slightly different significant differences between intermediate temperature regimes. The SOM-MOS dataset, with its far smaller number of total profiles, had wider 95% confidence intervals than the initial characterization dataset for every temperature regime, though the confidence intervals of the two datasets overlapped in every temperature regime. In terms of specific patterns, the larger sample size of the initial characterization dataset showed that soils with gelic temperature regimes [mean annual soil temperature (MAST) of < 0 or < 1 °C, depending upon level of taxonomic classification] had the largest profile SOC stocks. Cryic and frigid temperature regimes, possessing MAST of 0–8 °C (but frigid regimes possessing more intra-annual variation) had similar profile total SOC stocks, which were in turn larger than the remaining temperature regimes. Soils with mesic temperature regimes (MAST > 8, < 15 °C) had significantly larger profile SOC storage than the thermic (MAST > 15, < 22 °C) and hyperthermic (> 22 °C) regimes, which did not differ from each other.

Fig. 2
figure 2

Whole soil profile SOC storage, to a depth of refusal or 1 m, for the six soil temperature regimes in USDA Soil Taxonomy as represented across NEON sites. Open squares, large error bar whiskers, and lower case letters denote back-transformed means, 95% confidence intervals, and significant differences between temperature regimes for profiles from the SOM-MOS dataset, which were not replicated within sites. Filled circles, small error bar whiskers, and capital letters denote the same for the initial characterization dataset, in which many profiles were sampled within each NEON site. The number of profiles for SOM-MOS and initial characterization datasets, respectively, are given parenthetically for each temperature regime

Natural drainage classes incorporate the climatic and topographic conditions under which a soil has formed, as mediated by soil hydraulic properties, and described in reference to the depth to, frequency and duration of internal free water. These classes are designated at the pedon level, and as hypothesized were a significant predictor of variation in whole profile SOC stocks in both the SOM-MOS and initial characterization datasets (Fig. 3). In both datasets, drainage was a stronger predictor of observed variation than soil moisture regime (Table 1), which very specifically characterizes the amount of soil moisture as it varies seasonally within a typical year for some superficial portion of the pedon (the control section), in reference to temperature and biological activity. The internally replicated (within NEON sites) initial characterization dataset spanned a wider range of natural drainage classes overall, possessed larger sample sizes within each, and revealed significant differences between most groups. Namely, very poorly drained (free water very shallow, persistent to permanent), poorly drained (free water shallow or very shallow, common to persistent), and somewhat poorly drained soils had the largest, second-, and third-largest mean profile total SOC stocks, respectively. Profile total SOC stocks of moderately well drained soils (free water moderately deep, transitory to persistent) ranged into those of well-drained soils (free water deep to very deep), which in turn differed from the smallest mean values for somewhat excessively and excessively drained soils, which have very deep, very rare occurrences of free water and high to very high hydraulic conductivity. None of the SOM-MOS soils possessed very poorly drained classifications; among those represented, poorly drained soils differed from all others, which did not differ from each other. Similar to soil temperature regimes, 95% confidence intervals for whole profile SOC stocks among SOM-MOS soils were much wider than for initial characterization soils, but overlapped for all mutually sampled drainage classes.

Fig. 3
figure 3

Whole soil profile SOC storage, to a depth of refusal or 1 m, for the seven natural drainage classes in USDA Soil Taxonomy as represented across NEON sites. See Fig. 2 caption for explanation of sample sizes, means, confidence intervals, and multiple comparisons for the two datasets. Note x-axis break

Soil taxonomic classifications reflect a multitude of factors influencing the development and hence morphology and properties of soils. Thus, soil taxonomic patterns in profile SOC (Table 1) are not independent of the presented climatic patterns (Figs. 2, 3). However, soil taxonomy also incorporates additional factors, and provides a framework for differentiating soils at hierarchically more specific levels. Considering the two datasets and four levels of classification tested here (Table 1), the coarsest (soil order) provided the best overall performance in terms of explaining total variation in SOC stocks between soil profiles across NEON sites (SSb/SSt), while avoiding problems with overfitting or limited within-group replication (assessed in terms of F statistics). In terms of specific patterns, the initial characterization dataset showed significant differences in profile total SOC between many soil orders (Fig. 4), with very large means (and variances) for the frozen and often saturated Gelisols and mostly saturated Histosols, to small mean values for the widely distributed, weakly developed Entisols (82 Mg C ha−1) and the Aridisols (23 Mg C ha−1). Intermediate values, many of them significantly different, were observed for other mineral soil orders. The SOM-MOS dataset revealed fewer differences between orders, but where these were detected they followed the same patterns as observed across the initial characterization dataset (Fig. 4).

Fig. 4
figure 4

Whole soil profile SOC storage, to a depth of refusal or 1 m, for the ten orders (of 12 total) in USDA Soil Taxonomy represented across NEON sites. See Fig. 2 caption for explanation of sample sizes, means, confidence intervals, and multiple comparisons for the two datasets

Hypothesis 2: patterns of variation in whole profile SOC stocks at the landscape level

Three NEON sites possessed localized drainage sequences with reasonably strong replication across drainage classes at the landscape level, enabling a test of Hypothesis 2 with the initial characterization dataset (Fig. 5). These sites included one in the western Great Lakes basin (TREE; Wisconsin) and two on the Outer Coastal Plain, in Florida (OSBS and TALL) The only site spanning the full range of seven natural drainage classes (OSBS) most closely approximated the continental pattern of increasing profile SOC with increasing wetness, albeit with fewer significant differences between drainage classes. The other two sites possessed soils from only four of the seven drainage classes; at these, the most poorly drained soils on the landscape tended to have more variable profile total SOC, but only at one of them (TREE) was profile SOC significantly greater with the poorest drainage than (all other) drainage classes.

Fig. 5
figure 5

Whole soil profile SOC storage, to a depth of refusal or 1 m, for the USDA natural drainage classes observed at three NEON sites with internal gradients in natural drainage. Plots show means and 95% confidence intervals after back-transformation from the ln-transformed values used for analysis. Lowercase letters denote significant differences between ln-transformed values. Note x-axis breaks

Hypothesis 3: SOC stocks in A vs. B horizons as related to land cover vs. parent material

Topsoil (A horizon) and illuvial (B) horizon SOC storage varied significantly with land cover/use and parent material, for both SOM-MOS and initial characterization datasets (Table 2). However, our specific hypotheses were not consistently supported by the results. In A horizons, land cover/use was a stronger predictor of variation in SOC storage than was parent material for both datasets, both in terms of the proportion of total variance explained and the F statistics of the two predictors. In B horizons, lower proportions of the total variance were explained by these two predictors, and there was no clear pattern of either variable being a stronger predictor of observed variation than the other. Namely, parent material explained somewhat larger proportions of variation than land use in terms of the fraction of total variation between observations, but at the expense of degrees of freedom (i.e., a much larger number of categorical groups than land cover/use classes).

Table 2 SOC stocks for A (above dotted line) and B (below dotted line) horizons, for SOM-MOS (left) and initial characterization (right) datasets, as related to land cover/use vs. parent material

Hypothesis 4: continuous variables influencing A and B horizon SOC stocks

Despite originating from the very same study sites and possessing the same pool of potential predictor variables, the SOM-MOS and initial characterization datasets diverged in the continuously varying factors best able to explain the variation observed in A and B horizon SOC stocks (Table 3). Best subsets models within the non-replicated SOM-MOS dataset nominated climatic drivers as those explaining the most variation, with MAT appearing first for both horizons. Mean annual precipitation and dithionite-citrate extractable Al concentrations additionally appeared in the strongest multi-variate A horizon model that met our criteria. Model predictive capacity, in terms of proportion of variance explained, was higher for the initial characterization dataset (R2 = 0.51 and 0.30 for A and B horizons, respectively, vs. 0.35 and 0.25 for the SOM-MOS dataset). The initial characterization dataset, which spanned the same broad climatic gradients but also replicated across soil physicochemical properties within each NEON site’s climate, suggested a stronger influence of variation in soil physicochemical properties, while climate variables were not present in the strongest models that met our criteria. In A horizons, pH values predicted the largest proportion of the variation in observed SOC stocks, with CEC, exchangeable Ca and Al concentrations, and sand content explaining the successive remaining shares of residual variation. Cation exchange capacity and oxalate-extractable Al concentration were also selected in the B horizon model, which further included coarse fragment and silt contents.

Table 3 Sources of continuous variation in A and B horizon SOC storage in the SOM-MOS (for each dataset and predictor variable, the between-group degrees of freedom (df_b) and sum of squares (SSb), total sum of squares (SSt), proportion of explained variation (as SSb/SSt), and F and P statistics are shown left) and initial characterization (right) datasets

Hypothesis 5: pedogenic variation in B horizon properties and SOC stocks at the continental scale

Considering all B horizons from the continental-scale, internally replicated initial characterization dataset revealed that nearly all measured physicochemical properties differed significantly between argillic/kandic, cambic, and spodic groups (Table 4; Kruskal–Wallis tests). The only exceptions were silt contents of argillic/kandic (n = 548) vs. cambic horizons (n = 295) and coarse fragment contents of spodic (n = 110) vs. cambic horizons. In general terms, spodic horizons tended to be coarse-textured, acidic, high in OC and extractable metal, and low in bulk density, extractable calcium, and CEC. In contrast, argillic and kandic horizons were the densest, finest-textured, highest in Ca and CEC, and lowest in extractable metals and OC. Compared to these extremes, cambic horizons were intermediate in most regards but had notably high CEC. Median SOC stocks were significantly different for all three groups, being 16 Mg C ha−1 for argillic/kandic horizons, 23 Mg C ha−1 for spodic horizons, and 30 Mg C ha−1 for cambic horizons.

Table 4 Physicochemical properties of illuvial (B) horizons falling into three major pedongenic groups: clay-enriched argillic and kandic horizons (B horizons with t and k designations), incipient cambic horizons (w designations), and organo-metal spodic horizons (s and h designations)

In keeping with their significant differences in nearly all physicochemical parameters, the three types of illuvial horizons in the initial characterization dataset differed in the variables that best explained variation in their SOC stocks, although several predictors were consistent across two of the three groups (Table 5). Predictors of SOC storage in argillic and kandic horizons, which were 58% of all B horizons in the continental-scale dataset, were the same in identity, rank, and sign as for B horizons overall in the dataset. These included CEC, oxalate extractable Al, coarse fragment, and silt contents. Cation exchange capacity was also the most significant predictor of SOC storage for cambic horizons, followed by climatic leaching index. In spodic horizons, B horizon SOC was related to oxalate extractable Al, pH, clay and coarse fragment contents, and MAP.

Table 5 Sources of variation in SOC storage of B horizons from the continental-scale distributed plot dataset falling into three pedogenic groups

Hypothesis 6: pedogenic variation in B horizon properties and SOC stocks within heterogeneous soil landscapes

Three NEON sites-one each in the Appalachian (MLBS), Rocky (NIWO), and Sierra Nevada Mountains (SOAP) provided well-replicated internal contrasts between soils with argillic (Bt/Bk) horizons vs. cambic (Bw) horizons. This observational design afforded the opportunity to test whether B horizon SOC stocks and relationships with soil physicochemical properties followed the same patterns within these local landscapes as observed across the continental-scale dataset. In contrast to the many significant differences in physicochemical properties between argillic and cambic B horizons observed at the continental scale (cf. Table 4), there were few significant differences between argillic and cambic horizons at these three sites (Table 6). Specifically, clay differed significantly between groups at 2 of the 3 sites, and pH, extractable Al and Ca contents differed significantly at one site each, respectively. Categorically, B horizon SOC stocks of the two pedogenic groups were significantly different at two of the three sites, but best subsets model selection identified continuously varying physicochemical properties as better predictors of variation in B horizon SOC stocks. However, the factors that best explained variation in B horizon SOC stocks differed depending on site (Fig. 6). The strongest model meeting our criteria at each site (each had P < 0.001) was a three-variable model, with adjusted R2 values ranging from 0.62 (NIWO) to 0.83 (MLBS). At MLBS, oxalate-extractable Al explained the largest share of variation in B horizon SOC (partial t = 8.186), with residual variation explained by coarse fragment (t = − 2.195) and clay (t = − 1.910) contents, respectively. At NIWO, variation in SOC was most strongly related to CEC (partial t = 3.903), followed by clay and extractable Al contents (t = − 2.898 and t = 2.700, respectively). At SOAP, B horizon SOC stocks were positively related to their sand (t = 6.650), extractable Fe (t = 4.686), and Ca (t = 3.192) contents.

Table 6 Paired comparisons of B horizon SOC storage and physicochemical properties for three NEON sites possessing internal contrasts of argillic (Bt/Bk) vs. cambic (Bw) horizons
Fig. 6
figure 6

Relationships between B horizon physicochemical properties and SOC stocks (ln-transformed) for three NEON sites. Points represent individual B horizon samples. Plots in the left column show relationships between the physicochemical property explaining the largest share of variation in SOC; plots in the center and right columns show relationships between the physicochemical property and the residual variation after accounting for the predictor variable(s) to the left. Best-fit lines are added as a visual aid; slopes and partial P values correspond to the coefficients returned by the strongest best subsets regression model for each site

Discussion

The NEON design, spanning continental gradients across sites, and landscape-level variation within them, affords unique opportunities to test hypotheses as a function of scale and study design, using two complementary datasets differing in their degree of within-site replication. Table 7 enumerates our hypotheses, their scales of testing and degree of support, the implications of these results for studies at NEON sites and similar cross-scale networks, and inferences into how soils function. Our overarching result, across all of these hypotheses, is that SOC patterns and predictors observed at one scale (or using one design) do not necessarily transfer to others.

Table 7 Summary of hypothesis test

Our first two hypothesis tests exemplify the dissonance that can occur when soils arrayed across a sequence on a landscape are expected to follow the trend observed across the same gradient at a larger scale. Profile total SOC stocks for soils differing in drainage at TREE, OSBS, and TALL (H2, Fig. 5) highlight that even landscapes with many similarities (i.e., humid climates, low relief, hydraulically conductive unconsolidated parent materials) can diverge from a strong pattern with a straightforward explanation (H1, Fig. 3). Importantly, few sites (whether NEON or otherwise) span the full range of drainage classes, and while each of the three we assessed shows a grossly similar qualitative trend, not one shows the clearly significant pattern that we observed at the continental scale, or others have reported at regional and continental scales (Dalsgaard et al. 2016; Davidson 1995; Davidson and Lefebvre 1993; Wills et al. 2013). Thus, expecting drainage sequences to align neatly with SOC stocks on any landscape is unwarranted, despite the convincing pattern from the large-scale dataset, which has been documented in individual site-level studies elsewhere (Raymond et al. 2011; Richardson and Stolt 2013; Webster et al. 2008). In terms of implications for SOC studies across scales, testing our first two hypotheses revealed that within-site replication is not needed to discern drainage, climatic, and soil taxonomic patterns in SOC at wide scales and for whole soil profiles. The common inference among these patterns is thus the ability of meaningful categories to encompass continuous variation in soil forming factors and processes, and in turn differentiate their SOC stocks. Ultimately, the factors and processes that govern the formation of soils as natural bodies interact and covary in fundamental ways that cannot be statistically disentagled. This reality makes categorization (e.g., as with soil taxonomy) a useful way to manage complexity, while acknowledging that categories include factors that may work synergistically (e.g., coldness and wetness) to produce strong patterns in observational datasets such as we have assembled across NEON sites.

Our latter three hypothesis tests demonstrate the implications that study designs have for the inferences that they can support, in terms of patterns and their transferability across scales. The first of these (H4, Table 3) demonstrated how broad-scale designs that do not replicate within sites can capture variation in SOC stocks as related to factors varying at such broad scales (e.g., climate), while failing to detect the influence of factors varying at broad and local scales (e.g., soil physicochemical properties). In this case, comparing results from the initial characterization dataset to those of the SOM-MOS dataset effectively controls for the broad climatic variation detected by the latter, in that climate variables do not vary meaningfully at the scale of a NEON site while soil physicochemical properties do. In terms of inference, this suggests that the apparent overarching influence of climate may in fact be less important than continuously varying soil physicochemical properties, to the extent that these factors are independent of each other. Our Hypothesis 5, focused on B horizons in particular, placed that continuous variation in physicochemical properties within three pedogenically defined groups at the continental scale (Tables 4, 5). The consistent differences in properties, and the relationships of unique physicochemical properties to SOC in each of those groups may superficially seem to provide a pedogenic basis for SOC variation at more localized scales, much like the hypothesized wetness – profile SOC relationship (H2). However, in similar fashion, testing this hypothesis (H6) at the landscape level at MLBS, NIWO, and SOAP revealed that pedogenic differences in B horizon properties and SOC stocks were far fewer at landscape than continental levels (Table 6, Fig. 6). Soils from the Bt/Bk vs. Bw pedogenic groups at each of these sites not only failed to exhibit most of the physicochemical differences observed at the continental scale, but the specific physicochemical factors that predicted SOC stocks differed across the three sites.

The SOM literature emphasizes the importance of organo-metal interactions to the formation of relatively stable SOM, or having positive relationships with SOC concentrations (Kleber et al. 2007; Kogel-Knabner et al. 2008; Rasmussen et al. 2018). Observational relationships between extractable forms of Al and SOC storage in A and B horizons at the continental scale (H4, Table 3) support this, to the degree that operational dissolutions target meaningfully discrete reactive forms of soil metals (Heckman et al. 2018a, b; Wagai et al. 2013). However, SOC relationships with Al and other physicochemical properties (e.g., pH, CEC) that we report are to some degree a product of covariance, and thus do not indicate a controlling influence of any one or three soil properties over SOC storage. That said, this covariance could reasonably be expected to be stronger in some soils (or portions of the profile) than others, and best subsets regressions (results not shown) demonstrated this was the case. For example, across all initial characterization A horizons, % organic C explained 46% of the variation in CEC, with the addition of % clay bringing the explanatory power of the model to 80%. In the lower organic matter B horizons, this order was reversed, with % clay explaining 36% of variation in CEC, and the addition of % organic C resulting in a model with adjusted R2 of 0.60.

The inability of observational statistics to disentangle integrative soil properties is not a new problem, nor is it particularly important when considering that physicochemical properties such as pH, base cation, and metal concentrations are temporally dynamic (McLaughlin 2014; SanClements et al. 2009; Thompson et al. 2006). A soil extraction from one point in time may therefore misrepresent the conditions under which that soil has formed, and fail to reveal climate, biogeochemical, and other factors that influence SOC stocks that have accumulated over longer timescales (Delgado-Baquerizo et al. 2017; Slessarev et al. 2016). Radiocarbon measurements indicate that most soils hold C that has been cycling over centuries to millennia, especially in B horizons (Heckman, this issue; He et al. 2016; Schrumpf et al. 2013; von Lutzow et al. 2006), implying that relationships between SOC and state factors as they are observed today may be at best coincidental, and at worst misleading. Paleo-disciplines (e.g., ecology, climatology, pedology) all indicate that over time scales relevant to soil C turnover, many soils have experienced wide variation (or directional changes) in factors such as climate and vegetation (Commerford et al. 2016; Shuman and Marsicek 2016). For example, in climates that are today semi-arid or drier, clay-enriched horizons (e.g., Bt, Bk) have been interpreted as relicts of past wetter climates needed to explain physical mechanisms of clay translocation and accumulation (Bockheim and Hartemink 2013; Elliott and Drohan 2009; Lavkulich and Arocena 2011). Soils in forest-grassland ecotones, often comprised of Alfisol-Mollisol complexes (Eckmeier et al. 2007; Krupenikov et al. 2011) provide another set of examples, in which mechanisms of SOM stabilization associated with these dramatically different vegetation types and disturbance regimes can both be detected (Masiello et al. 2004). Notably, soil properties developed over long-term pedogenesis can influence SOC concurrently over the course of that long-term development; they can also pre-dispose soils to stabilize contemporary C inputs according to distinct mechanisms (Sanderman 2018). These are not mutually exclusive, and because our analysis addresses patterns in stocks rather than mechanisms of stabilization, we stress the operational disconnect between snapshot soil extractions and century to millennial soil formation and 14C ages. Our intent is to highlight that the study of soils and SOC stocks depends largely upon observational approaches, and within that context, the challenge of disentangling the factors and properties of natural systems (soils) that are by definition multi-factor.

The influence of top-down (e.g., land cover/use) vs. bottom-up (e.g., parent material) factors on SOC storage in A and B horizons are examples of factors that may influence SOC over short- vs. long-term timescales. Regarding our third hypothesis, SOM-MOS and initial characterization datasets concurred that parent material influences SOC stocks in A horizons (Table 2). This was despite an apparently greater sensitivity to changes in surface processes, inputs, and disturbance regimes, as inferred through our observational statistical results for land cover/use and as supported by literature (Angst et al. 2018; Barre et al. 2017; Mao et al. 2020). In B horizons, variation in SOC stocks at NEON sites was generally less predictable, with both categorical predictors providing significant explanatory power but neither obviously explaining a larger share of the (still largely unexplained) variation.

Coarse fragment content is a continuously varying soil property inherited from parent material with a significant influence on fine soil SOC storage in B horizons at continental and landscape scales (Tables 3, 5, Fig. 6). This finding raises two important points for consideration. First, though rocks are more often considered a sampling problem, a particle size class to be excluded during sample processing, or a mathematical term in an element stock computation, they may be a widespread underlying driver of SOC storage. Rocks may thus warrant more attention in SOC inventories, which often do not address them explicitly. Second, relationships between rocks and SOC stocks exemplify the inability to generate mechanistic explanations from the observational studies that are the norm in the SOM literature. In this case, plausible explanations can be proposed for several mechanisms by which rocks could increase fine soil SOC stocks. By decreasing the available volume of fine soil, rocks may concentrate rooting, OM inputs, and biogeochemical and pedogenic processes in the fine soil (Harrison et al. 2003; Pierret et al. 2016). Rocks are also a source of primary minerals and limiting nutrients, and careful excavation often reveals preferential root and rhizosphere activity in rocky soil volumes or in the coarse fragments themselves (Arredondo et al. 2019; Hoffland et al. 2002; Fahey et al. 2017). Coarse fragments also create physical surfaces for accumulation of clay films and organo-metal complexes; as these coatings thicken into the surrounding fine soil matrix, they may increase overall SOC contents (Bockheim 2011; Frazier and Graham 2000; Schaetzl 1996; Sommer et al. 2001; Stolt et al 1993). These three potential explanations for coarse fragment—SOC relationships are not mutually exclusive; each is supported by literature, yet none can be offered as a stand-alone explanation for the observed pattern. Accepting, then, that each soil is the integration of myriad processes and factors acting across scales and changing through time, these explanations for just one of our results only beg more questions. Why should any one, or even five factors be denoted as “controlling” SOC storage? How (if at all) do mechanisms of SOM stabilization relate to the amount of SOC stored in soils? Our results overall (and for coarse fragments specifically) indicate that the factors most closely related to SOC stocks depend very much on time and place. Ultimately, soils are unique down to scales finer than their variation can be feasibly sampled, and the challenge to applying the findings of any inventory lies in deciding the scale at which the dissonance between broad pattern and site specificity is unacceptable.

Conclusions

By analyzing two complementary soil datasets from 40 NEON sites at continental and landscape scales, we have shown that patterns and predictors of SOC stocks depend upon design and scale, and that inferences gained from one design (or at one scale) do not necessarily transfer to the other. Climate patterns are robust at the continental scale, regardless within-site replication, but within-site replication reveals the influence of landscape-level variation in soil physicochemical properties on SOC stocks. Strong continental patterns such as larger profile SOC stocks for poorer drainage classes, or increasing B horizon SOC as related to extractable metals are only marginally transferrable to individual landscapes with narrower gradients. At localized levels, the predictive utility of categorical schemes is exceeded by continuous variation in physicochemical factors, even when the latter are differentiated into meaningful groups by the former.