Origin of carbon in agricultural soil profiles deduced from depth gradients of C:N ratios, carbon fractions, δ13C and δ15N values

Agricultural soils in Germany store 2.54 Pg of organic carbon (C). However, information about how and when this C entered the soils is limited. This study illustrates how depth profiles of organic matter can shed light on different entry paths of organic C. Machine learning was used to explain total organic C (TOC), C:N, particulate organic C (POC), δ13C and δ15N values down to 100 cm depth based on pedology, geology, climate and management-related variables from the German Agricultural Soil Inventory. We estimated TOC turnover rates based on the relationship between the proportion of maize (only C4 plant) in crop rotations and soil δ13C values. In the upper 30 cm of cropland, fresh photosynthates added on average 0.2 to 0.8 Mg C ha− 1 year− 1. Organic fertiliser was another source of topsoil C, especially in grassland. Sandy sites in north-west Germany contained historic C from past heathland and peatland. One third of German agricultural land was found to be on colluvial and alluvial deposits, in which allochthonous C from upstream and upslope areas evidently increased the TOC content of subsoils. In and below hardpans, TOC content and C:N and POC:TOC ratios were low, indicating restricted root-derived C input. Our data indicate that ongoing management in German agricultural soils mainly affects topsoil C, while C storage in subsoils reveals significant legacies from allochthonous, buried or translocated C inputs. Specific attention should be focused on the sustainable loosening of hardpans that could result in a slow, but significant increase in subsoil C stocks.


Introduction
Globally, soils contain about 1,500 Pg of carbon bound within soil organic matter in the upper 100 cm (Batjes 1996). This is about twice the amount of carbon (C) that currently resides in the Earth's atmosphere as carbon dioxide (IPCC 2013). Nevertheless, soil organic C content tends to decrease with greater depth, following the depth distribution of organic C input and its turnover. In agricultural soils, primary organic C inputs may originate from: (i) aboveground biomass that is not harvested, such as stubble, mulch and green manure, (ii) organic fertilisers including animal excreta, compost and biogas digestates, and (iii) root litter and rhizodeposits. The first two sources of organic C enter soil at the surface, with redistribution along the soil profile possibly only occurring via bioturbation, tillage and/or leaching of mobile organic C species. Rootderived C, however, enters soil directly belowground to a depth of one metre or more (Canadell et al. 1996). The relative contribution of aboveground biomass, organic fertiliser and roots to total C inputs varies considerably between different land uses and management regimes. Grassland, for instance, receives more C input via roots and rhizodeposits than cropland (Pausch and Kuzyakov 2018), while cropland receives a larger C input from aboveground in the form of harvest residues and stubble.
After C has entered the soil, it is prone to a range of transformation processes (Kögel-Knabner and Amelung 2014). Transformation results in significant contributions of microbial debris to total organic C (TOC) (Appuhn and Joergensen 2006;Liang et al. 2019;Miltner et al. 2012). Microbial transformations of TOC are usually accompanied by stable isotope discrimination processes. Undecomposed photosynthates from annual crops and grasses typically show δ 13 C signals from − 26 to -30‰ for C 3 plants and − 10 to -14‰ for C 4 plants (Philp and Monaco 2012), as well as δ 15 N signals from − 8 to 9‰ depending on the nitrogen source (Craine et al. 2015). During microbial processing, the organic C fraction is enriched with heavy isotopes of C and N, which is reflected in increasing δ 13 C and δ 15 N values (Boutton 1996;Natelhoffer and Fry 1988). If organic material is processed and decomposed, its C:N ratio also decreases from > 15 to the C:N ratio of microbes ranging from 5 to 8 (Amelung et al. 2018). This makes δ 13 C, δ 15 N and C:N values proxies for characterising the degree of microbial processing of organic C and detecting the presence of relatively undecomposed litter. The latter can also be quantified directly by density fractionation, which separates soil organic C into particulate organic C (POC) and mineralassociated C (MOC).
To date, most work on C fractions, δ 13 C and δ 15 N values, and C:N ratios has been conducted at field or sub-field scale and focused on topsoil because: (i) organic matter in topsoil is more sensitive to land use or management treatments than subsoil, and (ii) measuring and detecting changes in POC, δ 13 C, δ 15 N and C:N signatures of subsoil can be analytically challenging as subsoil typically contains much less organic matter than topsoil. Although globally more than half of the soil organic C in the upper metre is stored below 30 cm depth (Batjes 1996), much less is known about the origin of C in subsoil than in topsoil. Carbon input into subsoils may come from the surface soil and enter subsoil via leaching (Kindler et al. 2011), turbation (bio-, cryo-, or peloturbation), or burial of topsoil C, depending on pedogenesis and climatic regions. However it may also come from the soil parent material, especially if the soil developed from relatively young and C-rich colluvial deposits , or directly from deep roots and their rhizodeposits.
Radiocarbon ages in soil profiles suggest that subsoil C-cycling is much slower than topsoil C-cycling. Once C has entered the subsoil, much of it seems to reside there for millennia (Balesdent et al. 2018). Crops deposit about one third of all root C below 30 cm depth (Jackson et al. 1996), but this number varies considerably in time and space. Given the long residence time of subsoil C, increasing root-derived C input in the subsoil on a large scale could remove significant amounts of C from the atmosphere. In this context, the breeding of plants with deep and bushy roots has been proposed as a negative emission technology for counteracting greenhouse gasinduced global warming (Kell 2011;2012). However, there are various soil properties that restrict the growth of deep roots, such as compacted soil layers, which are commonly referred to as hardpans (Lynch and Wojciechowski 2015). Oldeman et al. (1991) projected the global extent of hardpans to be 68 million hectares and attributed hotspots of soil compaction in Europe to heavy machinery. For Germany, Schneider and Don (2019) recently estimated half of all cropland to be compacted due to either pedogenetic causes (37% of cropland) or anthropogenic causes (13% of cropland). However, the effect of hardpans on subsoil C has received limited scrutiny to date.
A better understanding of the origin of C in agricultural soils could improve the design of future climatesmart management practices for enhanced C sequestration. Here, we hypothesised that depth gradients of soil organic C reflect past organic C inputs and allow to track the contribution of deep roots to subsoil C, unless subsoils contain large amounts of topsoil C or parent material-inherent C. We assumed that deep input of root C correlates positively with the depth gradients of TOC, C:N and POC:TOC, but negatively with 13 C: 12 C and 15 N: 14 N abundances, as elevated portions of undecomposed root material in the subsoil should also be reflected in lower δ 13 C and δ 15 N values. We also assumed the depth distribution of root-derived C to be controlled by soil properties and management. We use Germany as a model country, aligning novel data with data from the first German Agricultural Soil Inventory  to evaluate the depth distribution of soil organic matter with respect to recent and historic C inputs into agricultural soils at regional scale. Specifically, our aims were to: & describe the depth gradients of TOC, C:N ratio, POC, δ 13 C and δ 15 N in the upper metre and relate the observed patterns to physico-chemical characteristics of the soil matrix, parent material, climate, land use and/or management using a data-mining approach. & investigate entry pathways and turnover of organic C in topsoils and subsoils. & estimate the effect of hardpans in croplands on rootderived C input in subsoil.

Study area
This study focused on agricultural soils in Germany, which cover an area of 166,451 km 2 (Destatis 2019). About 70% of this area is used for annual crops (117,309 km 2 ), 28% is used as permanent grassland (47,134 km 2 ), and the remaining agricultural land is used for perennial crops (Destatis 2019). Based on FAOSTAT (2019), the top three crops grown between 2009 and 2018 were wheat (Triticum aestivum L., mean grain yield: 7.6 t ha − 1 ), followed by barley (Hordeum vulgare L., mean grain yield 6.5 t ha − 1 ) and canola (Brassica napus L., mean grain yield 3.7 t ha − 1 ). German grassland is typically managed intensively and includes 1.4 times more pasture than meadow (Destatis 2019).

Sampling design & data acquisition
This study was based on the dataset from the first German Agricultural Soil Inventory, which comprises information on the soil and management at 3,104 sites covering all agricultural land in Germany in a regular 8 km x 8 km grid . At each site, soil profiles were dug to 100 cm depth. Soil morphology was characterised based on soil horizons following AD-HOC-AG Boden (2005), while composite soil samples for laboratory analysis were taken at fixed depth increments (0-10, 10-30, 30-50, 50-70 and 70-100 cm). If a horizon boundary was at least 5 cm above or below a sampling boundary, an additional soil sample was taken. This allowed laboratory and field data to be merged. Total organic C (TOC), total inorganic C (TIC) and total nitrogen were measured by dry combustion using an elemental analyser (LECO TRUMAC, St Joseph, MI, USA). Sand content was determined by wet-sieving, and silt and clay contents following the pipette method. Bulk density, rock fragment fraction and pH H2O were measured as described by Jacobs et al. (2018).
As processes governing C stabilisation in organic soil differ from those in mineral soils, and as root-derived C inputs contribute only little to C stocks of organic soil, this study focused solely on mineral soils. Therefore, 165 sites with organic soil were removed from subsequent analyses, leaving 15,935 mineral soil samples from 2,939 sites for which TOC and C:N values were readily available. For the present study, δ 13 C and δ 15 N natural abundance as well as particulate organic C to total organic C (POC:TOC) ratios were also determined. The δ 13 C and δ 15 N measurements were restricted to 1,357 soil samples from 248 core sites. These core sites comprise a representative subset of mineral agricultural soils in Germany with respect to soil reference group, land use and C stock. Detailed criteria for the core site selection are described by Vos et al. (2018). Isotopes were analysed using an isotope-ratio mass spectrometer (IRMS, Thermo Fisher Scientific Delta plus). All δ 13 C values refer to the organic C fraction only. Soil samples containing inorganic C were fumigated with hydrochloric acid, as described by Walthert et al. (2010), prior to δ 13 C analyses. If total C content based on IRMS was higher than TOC based on dry combustion, inorganic C removal was repeated and δ 13 C re-measured. For 183 samples, no reliable δ 13 C measurements could be obtained for the organic C, mostly due to incomplete removal of inorganic C at TIC:TOC ratios over 5.
Values of δ 15 N refer to total N. The δ 13 C values are reported relative to Vienna Pee Dee Belemnite and δ 15 N values are reported relative to atmospheric nitrogen following international standards.
POC:TOC ratios were determined using near-infrared spectroscopy (FT-NIRS; MPA, Bruker Optik GmbH, Ettlingen, Germany). The basic concept for this approach has recently been proven by Jaconi et al. (2019) who also used soil samples from the German Agricultural Soil Inventory, but focused on topsoil (0-10 cm) only. Here, we present POC:TOC ratios for all depth increments of the German Agricultural Soil Inventory. Our predictions of POC:TOC ratios are based on the same 105 topsoil samples used by Jaconi et al. (2019), plus additional subsoil samples from 27 of these 105 sites. The 27 subsoil samples were selected to cover the complete range of clay content and C:N ratios of subsoils in order to get a representative subset of the subsoil samples ( Fig.  A1.1). For each of these 27 subsoil samples, TOC was fractionated into POC and MOC using density fractionation. In brief, soil was dispersed in a sodium polytungstate solution with a density of 1.6 g cm − 3 using ultrasonic power. POC was defined as the light C fraction with a density of < 1.6 g cm − 3 while MOC was defined as the remaining, heavier C fraction. We followed exactly the fractionation protocol, which was previously described by Jaconi et al. (2019) for topsoil. However, because the POC content of subsoil was about one order of magnitude lower than in the topsoil, up to 120 g soil instead of 10 g for topsoils were fractionated. This was time-consuming and cost-intensive (up to 620 g sodium polytungstate per sample), therefore not all the sites included in the study of Jaconi et al. (2019) could be considered for subsoil fractionation. TOC recovery for subsoil samples ranged from 80 to 120%. Spectral pretreatments and models were implemented as proposed by Jaconi et al. (2019). Additional tree-based algorithms (ranger, cubist) and modelling approaches (log-ratio transformation of POC and MOC content vs. log-ratio transformation of POC:TOC and POC:MOC, separate prediction of POC and MOC contents vs. separate prediction of POC:TOC and MOC:TOC ratios) were also tested, as detailed in the Supplementary Material. Model performance was judged based on leave-one-site-out validation, i.e. the data was partitioned by sites and the values of each site were predicted based on models that were trained only based on the samples from the other sites (Jaconi et al., 2019). This validation procedure is more robust than partitioning the dataset just once into training and testing (Hastie et al., 2009). An ensemble of three individual model combinations performed best ( Fig. A1.2) and was used to predict the POC:TOC ratios of all unseen soil samples.
In order to enhance the comparability among sites, all soil data were aggregated to obtain one single value per depth increment (0-10 cm, 10-30 cm, 30-50 cm, 50-70 cm and 70-100 cm). This was done as follows: TOC and TIC concentrations, C:N and POC:TOC ratios, δ 13 C and δ 15 N values, as well as contents of clay and sand, and pH H2O values were aggregated by weighted means, with fine soil stock of the individual depth increments (Mg soil < 2 mm ha − 1 ) as the weighting factor. For continuous variables given in vol-% (rock fragment fraction) or in area-% (e.g. oximorphic features at the profile wall), the thickness of the respective depth increment served as the weighting factor. Categorical variables with only two levels (e.g. buried topsoil, yes or no) were dummy coded (yes = 1, no = 0) and also aggregated, with the thickness of the respective depth increment serving as the weighting factor. For categorical variables with more than two levels (soil horizon and stratigraphy), the factor level with the largest contribution to the depth increment was used. Information on crop rotations, tillage and management during the past decade came from farmer questionnaires. Site-specific C input data was estimated by Jacobs et al. (2020).
Analyses of organic C proxies along the soil profile Each organic C proxy (TOC, C:N, POC:TOC, δ 13 C and δ 15 N) was predicted using machine learning on variables that provided information about current C input (e.g. root-derived C), geogenic C input (e.g. parent material), translocated topsoil C (e.g. horizon symbols), soil transport (e.g. slope) and C stabilisation (e.g. texture). Furthermore, we included climate-related explanatory variables because climate is a major soil-forming factor and can have various effects on C input and C stabilisation. Table 1 provides an overview of all 39 explanatory variables used. In the Supplementary Material, these explanatory variables are characterized in further detail (Table A2.1, Fig. A2.1, Fig. A2.2).
In a first step, the machine-learning algorithm was trained to predict the bulk soil value of each organic C proxy (TOC, C:N, POC:TOC, δ 13 C and δ 15 N) at a given depth solely as a function of these 39 explanatory variables. The resulting 25 models (five organic C proxies x five depth increments) were independent of the soil Table 1 Overview of explanatory variables used to predict organic C proxies. For further details, German translations and maps of these explanatory variables, the reader is referred to properties in the depth increments above or below. In a second step, tests were carried out to assess how much and to what depth model performances would improve if the algorithm was also informed about the measured values of the targeted organic C proxy in the uppermost sampling layer. Thus, TOC content from 10 to 100 cm depth was predicted, as was done in the first step, but here included site-specific TOC contents from 0 to 10 cm depth as an additional (40th ) explanatory variable in the models. This was done in an analogous way to also predicting the other organic C proxies (C:N, POC:TOC, δ 13 C and δ 15 N). In the third and final step, the ratio between a given organic C proxy and its value in the uppermost sampling layer (0-10 cm) was predicted. Such ratios are referred to below as "normalised values" to distinguish them from "bulk soil values". Normalised values were used to describe the depth gradients of organic C proxies. For example, if a site showed bulk soil TOC values of 1 g kg − 1 in 50-70 cm and 5 g kg − 1 in 0-10 cm, the normalised TOC value in 50-70 cm depth would be 1:5 = 0.2 or 20%. Soil layers with normalised values above 300%, e.g. more than three times as much TOC in 50-70 cm than in 0-10 cm, were treated as outliers and excluded from further analyses. Furthermore, C:N ratios of carbonate-rich soils with more than twice as much TIC as TOC were excluded because these measurements were potentially biased. For machine learning, the party::cforest implementation of random forest and bagging ensemble algorithms was used (Hothorn et al. 2005;Strobl et al. 2008;Strobl et al. 2007) owing to its underlying strength in deriving variable importance from a mix of continuous and categorical predictors (Hapfelmeier et al. 2014). For each model, five folds with three repetitions were applied and the hyperparameter mtry was tuned using grid search on the following values: 3, 6, 13, 26, 39 or 59. The performance of each model was evaluated by root mean squared error (RMSE) and mean coefficients of determination (R 2 ) metrics. Permutation importance was calculated based on Hapfelmeier et al. (2014) using the mtry value that produced the highest R 2 . Variables of greater importance than would be expected from a theoretical model where all variables are equally important were considered influential (Hobley et al. 2015). The importance of influential variables was scaled to 100% for better comparability. Explanatory variables that correlated the most were elevation/temperature (Pearson's r = -0.81 in 0-10 cm depth) and sand/clay (Pearson's r = -0.74 in 0-10 cm depth). Because these Pearson's r Master soil horizon symbols based on AD-HOC-AG Boden (2005). Factor levels (English description based on Wittmann et al. 1997): A (mineral topsoil horizon; epipedon); B (mineral subsoil horizon characterized by a change of colour and mineral composition of the parent material resulting from the accumulation of topsoil constituents, which were removed from the overlying horizon or/and weathering in situ; contains less than 75 vol-% of residual parent rock and no lithogenic carbonate); C (unaltered or unweathered material similar to parent material); E (plaggen-horizon; humus content similar to Ah-horizon); G (horizon affected by groundwater, which causes the development of redoximorphic features); H (organic horizon originating from residue of peat-forming plants); M (horizon of fluvial, colluvial soils from translocated soil-material, humus content like Ah); P (subsoil horizon on claystone or clay (> 45% clay) with prismatic to subangular blocky structure and temporarily wide cracks); R (man-made horizon by deep cultivation (> 40 cm), humus content like Ah); S (subsoil horizon affected by a perched water table; surface water stagnation); T (subsoil horizon from the solution residue of carbonate rock, clay content > 65%, < 5% carbonate).
TIC C Total inorganic carbon (g kg − 1 ) * C = continuous variable; F(L) = categorical variable with L levels values were less extreme than the threshold values for collinearity applied in previous work (e.g. Hobley et al. 2015), in our study, collinearity was not regarded as a problematic issue.
Turnover and mean residence time of organic C Maize (Zea mays L.) was the only C 4 plant in crop rotations, and as such introduced a natural 13 C label into the soil. We estimated the fraction F of maize-derived C in TOC after t years of maize based on Balesdent et al. (1987) as: where S1 and S0 were the soils of sites with and without maize respectively and P1 and P0 were photosynthates from maize and from other crops respectively. 13 C P1 was set to − 12‰ (Balesdent et al. 1987). 13 C S0 was defined as the intercept of a linear regression between the number of maize years (t) and the corresponding δ 13 C values of soil ( 13 C S1;t ). The slope of this regression times the number of maize years (t) plus 13 C S0 yielded the δ 13 C value of soil after t maize years ( 13 C S1;t ). 13 C P0 was assumed to be the same as 13 C S0 , which is a common simplification because the difference between C 3 and C 4 -derived organic C is typically about one order of magnitude higher than turnoverinduced C fractionation in soil (Balesdent et al. 1987;Boutton 1996). Carbon turnover was defined as the average increase in F per maize year. Crop rotation was reported for a period spanning 1 to 17 years depending on the site. A pre-test revealed that using sites with crop rotations reported for ≤ 9 years resulted in overestimated organic C turnover ( Fig.  A3.1). Hence, all sites with only nine years of crop rotation data or less were omitted. Among the remaining sites (n = 124), only the past ten years were considered for calculating organic C turnover in order to increase the comparability among sites.
The mean residence time of organic C was estimated following Amelung et al. (2008) as -t/ln(1-F(t)), where 1-F(t) represents the remaining proportion of C 3 -C after t years of maize. This approach assumes: (i) TOC stocks to be in equilibrium and (ii) first-order kinetics for organic C decomposition (Balesdent and Mariotti 1996).

Soil compaction vs. root-derived deep C input
The level of soil compaction was classified according to Schneider and Don (2019) as not compacted (packing density < 1.75 g cm − 3 ), moderately compacted (1.75 g cm − 3 < packing density < 1.82 g cm − 3 ) or severely compacted (packing density > 1.82 g cm − 3 ). A given site was classified as compacted at depth i if a compacted soil layer occurred at or above depth i . This was done in order to account for the fact that compacted soil layers (e.g. ploughpan at 30-50 cm) may act as a barrier to root-derived C inputs at greater depths (e.g. subsoil below ploughpan). Root-derived C inputs were inferred from normalised TOC, C:N, POC:TOC, δ 13 C and δ 15 N values, i.e. measured values of TOC, C:N, POC:TOC, δ 13 C and δ 15 N at a given depth i divided by their respective value in the uppermost sampling layer (0-10 cm). Normalised values were preferred over bulk soil values because normalisation minimised site-specific differences in organic C. Other factors influencing the depth gradients of TOC, C:N, POC:TOC, δ 13 C and δ 15 N, apart from root-derived C inputs, were assumed to be independent of the level of compaction. The relationship between root-derived C input and soil compaction was examined by depth (30-50 cm, 50-70 cm and 70-100 cm). Topsoil was not considered because the majority of hardpans occurred in subsoils. The analyses were restricted to cropland soil only (2,261 sites) because compaction in cropland was more severe than in grassland.

General statistics & software
All data analysis was performed using R (R Core Team 2019) in RStudio (RStudio Team 2019) and built on tidyverse packages (Wickham et al. 2019). Computationally intensive code was implemented in parallel using the foreach package (Microsoft and Weston 2020) and run on a high-performance server with 64 cores (128 threads). Spectral pre-treatments were done using prospectr (Stevens and Ramirez-Lopez 2013), while pls (Mevik et al. 2019), ranger (Wright and Ziegler 2015) and cubist (Kuhn and Quinlan 2018) were used for spectral modelling. Each key statistic in the text is accompanied by its 95% bias-corrected and accelerated (BCa) confidence interval based on ≥ 3000 bootstrapped resamples calculated using the boot package (Canty and Ripley 2019).

TOC
In 0-10 cm depth, TOC content averaged 23.5 g kg − 1 (95% CI, 23.0 to 24.1) and its interquartile (middle 50%) ranged from 12.9 to 29.6 g kg − 1 . Variation in TOC at this uppermost depth increment was mostly caused by different land use (cropland vs. grassland) and clay content (Figs. 1 and 2). In 0-10 cm, the TOC content in grassland was 2.4 times (95% CI, 2.3 to 2.5) greater than that in cropland and increased significantly with clay content (Fig. A3.2). Land use and soil texture showed distinct regional patterns, which were reflected in the TOC map ( With increasing soil depth, the importance of variables to explain TOC content changed considerably (Fig. 2). Instead of land use, climate-related variables were the most important for predicting TOC in 10-30 cm depth. TOC content correlated positively with precipitation and negatively with the number of summer days (Fig. 2). The "Weser-Ems-Geest" climatic region in north-west Germany, which is characterised by a maritime climate (high precipitation, not many summer days) and historic heathland and peatland cover, showed the largest TOC content (Fig. 3d). In 30-70 cm depth, soil reference group and horizon symbols were most important for explaining TOC. For example, in 50-70 cm, Anthrosols contained 8.4 g kg − 1 (95% CI, 7.3 to 9.8), Chernozems, Phaeozems and Vertisols contained 5.8 g TOC kg − 1 (95% CI, 5.5 to 6.2), while all other soil groups contained only 3.1 g TOC kg − 1 (95% CI, 2.9 to 3.2) (Figs. 3a and 4). At the same depth, M horizons designating colluvial deposits (mean 7.2 g kg − 1 ; 95% CI, 6.5 to 8.0) and deep or buried A horizons (mean 5.8 g kg − 1 ; 95% CI, 4.8 to 7.1) showed significantly larger TOC contents than other soil horizons (mean 3.5 g kg − 1 ; 95% CI, 3.4 to 3.7) (Fig. 3b). Finally, below 70 cm depth, geological chronostratigraphy and texture, both characterising the soil parent material, were the most important for predicting the contents of TOC in this subsoil depth. Holocenic sediments, which comprised mostly fluvial deposits along major river valleys (Elbe and Rhine), marine deposits along the north-west coastline and colluvial deposits at the foot of slopes, showed TOC contents that were three times (95% CI, 2.6 to 3.4) greater in 70-100 cm depth than soils from other, older parent materials. As with topsoil, the content of clay in subsoil also correlated positively with that of TOC ( Fig. A3.2). Although TOC content in grassland was significantly higher than that in cropland at all depth increments ( Fig. 1), land use was not relevant for explaining TOC below 10 cm depth (Fig. 2). Different TOC contents in the subsoils of cropland and grassland were related to preferential grassland use on Gleysols, presumably because of traffic restrictions for heavy machinery and elevated flood risks. Many Gleysols developed from C-rich holocenic sediments (Fig. 5). Furthermore, Gleysols may accumulate C due to oxygen limitationboth processes mask potential effects of land use on TOC in subsoils. If the different TOC contents in subsoils of cropland and grassland were due to land use, the relative difference between the TOC contents of cropland and grassland should have decreased with depth. However, the opposite was the case: the relative difference was lowest in 10-30 cm, where TOC content under grassland was 1.27 times (95% CI, 1.21 to 1.33) higher than that under cropland. Below 30 cm, the relative difference between grassland and cropland increased again to 1.62 times (95% CI, 1.41 to 1.90) higher TOC contents under grassland than that under cropland in 70-100 cm soil depth. Overall, predictions for bulk TOC without information from other depth increments were better in 0-10 cm (R 2 of 0.70) than below this depth (R 2 from 0.39 to 0.45). Including the TOC contents of 0-10 cm as an additional predictor of TOC in > 10 cm depth improved model performance considerably, confirming that TOC contents were site-specific and that TOC in topsoil and subsoil was related (Fig. A3.3). The importance of TOC in 0-10 cm for explaining TOC below 10 cm depth decreased with greater depth.
Depth gradients of TOC, which were expressed as normalised TOC values, depended less on clay content than was the case with TOC values of bulk soil (Fig. 2). Instead, normalised TOC values were related more to land use and soil horizons. In the upper 30 cm of grassland, TOC content decreased much more with depth than it did under cropland (Fig. 1), which is why the effect of land use on normalised TOC was most pronounced in 10-30 cm depth (Fig. 2). In 30-70 cm, differences between soil horizons were most important for explaining normalised TOC values. For example, in 30-50 cm, M horizons and A horizons still contained 43% of TOC in 0-10 cm (95% CI, 42 to 45), while the other master horizons contained only 27% of TOC in 0-10 cm (95% CI, 26 to 27). Below 50 cm depth, geological chronostratigraphy became increasingly important for explaining normalised     L a n d u s e TOC contents, while parameters related to soil formation processes (soil horizon and soil reference group) became less importanta pattern similar to that observed for TOC values of bulk soil.

C:N ratio
In 0-10 cm, the mean C:N ratio of bulk soil was 11.2 (95% CI, 11.1 to 11.3) and its interquartile range was from 9.9 to 11.7. This variation was mostly related to texture. The C:N ratio increased with increasing sand content, especially above 70% sand, while it decreased exponentially with increasing clay content to around 10 for clay contents above 10% (Fig. A3.2). High C:N ratios clustered in particular in the "Old Drift" region ( Fig. 3c, Fig. A2.2) and the "Weser-Ems-Geest" pedoclimatic zone (Fig. 3d, Fig. A2.2). These regions are located in north-west Germany and are characterised by historic heathland and peatland cover. At greater depths, Random Forest identified similar drivers and performed equally well in explaining C:N ratios of bulk soil as for 0-10 cm soil depth (see above). In 30-70 cm, differences in C:N ratios by soil reference groups were most pronounced (Figs. 3a and 4) and consequently the variable importance of soil reference groups increased (Fig. 2). This was particularly because Podzol soils showed remarkably wide C:N ratios in subsoils. The C:N ratio of Podzols increased with greater depth, which was different to all the other soil reference groups (Figs. 3a and 4). Surprisingly, at all depth increments, the C:N ratio of cropland and grassland was very similar (Fig. 1) and consequently land use was not acknowledged as being important for predicting C:N ratios (Fig. 2). The site-specific nature of C:N was highlighted by a strong improvement of models for subsoil C:N when informed about the respective C:N ratio in 0-10 cm (Fig. A3.3).
The C:N ratios changed much less with increasing soil depth than TOC. In 70-100 cm depth, bulk soil C:N was only 25% lower than in 0-10 cm depth (95% CI, 24 to 27). Even land use and tillage introduced only minor changes in the depth distribution of C:N ratios (Fig. 1). Normalisation of C:N ratios, i.e. expressing C:N ratios in terms of the percentage of their values in 0-10 cm, made land use important as a predictor of the ratios in 10-30 cm. It also increased the importance of groundwater and geological chronostratigraphy, whereas soil texture had less explanatory power (Fig. 2, Fig. A2.2).

POC:TOC ratio
The POC:TOC ratio showed the most distinct regional pattern of all the organic C indicators examined in this study (Fig. 1). In north-west German lowlands, bulk soil POC:TOC ratios above 30% were widespread in 0-10 cm of soil, while other regions had lower POC:TOC ratios. As for C:N ratios, soil texture and variables identifying historic heathland and peatland cover were most important for predicting POC:TOC ratios in 0-10 cm soil depth (see "Old Drift" in Fig. 3c; "Weser-Ems-Geest" in Fig. 3d). The POC:TOC ratio decreased exponentially with increasing clay content (Fig. A3.2).
With increasing depth, mean POC:TOC ratios decreased from 20% (95% CI, 19 to 20) in 0-10 cm to 11% (95% CI, 11 to 12) in 70-100 cm depth. Despite this decrease, the importance of explanatory variables hardly changed with depth, and texture remained the most important explanatory variable, which made POC:TOC in 0-10 cm a good predictor of POC:TOC at greater depths ( Fig. A3.3). Major variations in subsoil POC:TOC were mostly associated with human soil profile modifications (Anthrosols, Fig. 4) and topsoil burial. Normalised POC:TOC ratios, which show the differences between sampling depths, were largely independent of soil texture, in contrast to the POC:TOC ratios of bulk soil. Instead, normalised POC:TOC ratios were driven by land use, soil reference group, soil region, parent material and pedo-climatic zone. In cropland, POC:TOC ratios decreased more gently with increasing depth than in grassland (Fig. 1). Podzols showed a much stronger decline of POC:TOC ratios with depth than other soil reference groups, even though the C:N ratio in Podzols increased with depth (Figs. 3a and 4). In the "Old Drift" soil region and Fig. 2 Important predictors of TOC content, C:N ratio, POC:TOC ratio, and δ 13 C and δ 15 N values by depth. Left: results for predicting bulk soil values without information from other depth increments. Right: results for predicting normalised values that describe depth gradients. Earthy colours represent pedology, grey represents geology and geomorphology, green represents anthropogenic influence and blue represents climaterelated variables. Areas are proportional to the relative importance of variables for predicting a given target at a given depth increment. Each model is characterised by the number of observations in the training data ("n") and average errors from three timesrepeated, five-fold cross validation (root mean square error "RMSE" and R-squared "R 2 ") For better clarity, only the most important factor levels are shown (for variables with more than ten factor levels, this is the ten most common levels; only factor levels with more than three observations in δ 13 C) Means are only shown for n > 5. Error bars illustrate bootstrapped 95% confidence intervals of the means "Ems-Weser-Gest" pedo-climatic zone, which both had historic heathland and peatland cover, POC:TOC ratios declined much more strongly along the soil profile than in other regions (Fig. A2.2).
In 70-100 cm, soil with mean groundwater tables shallower than 80 cm showed a mean δ 13 C value of -27.7‰ (95% CI, -28.0 to -27.5), while soil with groundwater tables deeper than 200 cm showed a mean δ 13 C of -26.3‰ (95% CI, -26.4 to -26.1). Similarly, in horizons with groundwater causing reducing conditions (Gr-horizons), an average δ 13 C value of -27.7‰ (95% CI, -28.0 to -27.5) was detected, while in other horizons δ 13 C was − 26.5‰ (95% CI, -26.6 to -26.4) in 70-100 cm The secondary y-axis shows the proportion of maize-derived carbon (C) assuming a two-pool mixing model with the mean δ 13 C without maize (-27.3‰) for the C 3 source and − 12‰ for the C 4 source. Maize was the only C 4 crop included in the crop rotations. Right: δ 15 N in grassland as a function of livestock units per hectare (n = 61). The secondary y-axis shows the share of organic fertiliser-derived nitrogen (N) assuming a two-pool mixing model with the mean δ 15 N without livestock (4.3‰) for plant litter-derived N and 10‰ for organic fertiliser depth (Fig. 7). Furthermore, in 70-100 cm, δ 13 C of TOC decreased with increasing TIC content. Subsoil δ 13 C values below 50 cm depth were independent of the δ 13 C in the surface soil (0-10 cm) due to the dominant influence of groundwater ( Fig. A3.3). Cropland use and maize crops in particular smoothed the depth gradient of δ 13 C since maize as a C 4 plant enriched 13 C in the topsoil (Fig. 7). Generally, changes in δ 13 C along the depth profile were relatively small, with Gr-horizons in 70-100 cm even containing slightly more 13 C than in 0-10 cm depth (mean 101%; 95% CI: 99 to 103).
δ 15 N In 0-10 cm, the mean δ 15 N value was 6.0‰ (95% CI, 5.8 to 6.2). With increasing depth, δ 15 N increased to 6.6‰ (95% CI, 6.3 to 6.8) in 30-50 cm before it decreased again and reached 5.6‰ (95% CI, 5.4 to 5.9) in 70-100 cm (Fig. 1). In 0-10 cm, differences in bulk soil δ 15 N were mostly due to land use, management and variables identifying historic heathland and peatland in north-west lowlands (Fig. 2). In 0-10 cm depth, the δ 15 N value under cropland was 1.2‰ points (95% CI, 0.7 to 1.7) higher than under grassland. Generally, δ 15 N correlated positively with the proportion of canola (Brassica napus), beet (mostly Beta vulgaris) and legumes (mostly Trifolium sp.) in crop rotations (Fig. 2). Additionally, soil δ 15 N values under both cropland and grassland were significantly lower in the "Ems-Weser-Geest" and "Old Drift" regions than in other regions (Fig. 3c, d). In grassland topsoil, soil δ 15 N was positively correlated with livestock density (Fig. 6). In the subsoil, land use and management lost explanatory power; instead, soil reference group and groundwater were important for predicting δ 15 N. In 70-100 cm depth, Gr-horizons were depleted in 15 N, showing 2.4‰ (95% CI, 1.9 to 2.8) lower δ 15 N values than other horizons. At this depth, δ 15 N in Gr-horizons averaged 78% of δ 15 N in 0-10 cm (95% CI, 66 to 110), while in other horizons δ 15 N averaged 104% of δ 15 N in 0-10 cm (95% CI, 98 to 112). The depth gradient of δ 15 N was thus best described by land use and groundwater influence. δ 15 N values under grassland increased more with greater depth than they did under cropland because of lower values in the surface soil (Fig. 1).

Turnover of maize-derived soil carbon
Every growing season with maize increased the δ 13 C value of cropland soil by on average 0.12‰ points (95% CI, 0.05 to 0.19) in 0-10 cm depth (Fig. 6). Based on Eq. 1, this translates to a mean turnover rate of 0.8% TOC year − 1 (95% CI, 0.3 to 1.3). In 10-30 cm depth, the C turnover rate was roughly the same (95% CI, 0.3 to 1.2). These turnover rates indicate that C resides in the ploughing layer for around 100 to 300 years on average before being released back again into the atmosphere. Below 30 cm depth, the proportion of maize in crop rotations did not correlate with δ 13 C, which is why turnover and residence time of subsoil C could not be determined.  Fig. 7 Mean δ 13 C values of bulk soil by land use, soil texture and groundwater level. Sample numbers ranged from n = 2 to 27 for groundwater < 2 m, and from n = 7 to 48 for groundwater > 2 m, depending on depth increments and land use. Error bars represent bootstrapped 95% confidence intervals and stars denote significant differences (P < 0.05) Soil compaction vs. root-derived deep C input In cropland soil with hardpans, TOC content and C:N ratio declined more with increasing depth than in cropland soil without hardpans (Fig. 8). Their normalised values, i.e. values at depth i relative to values in 0-10 cm, decreased significantly at increasing levels of soil compaction. The magnitude of this effect decreased with increasing soil depth. The above pattern was similar for POC:TOC ratios, but the significance here was not consistent between depth increments. In contrast to our initial hypothesis, the depth gradients of δ 13 C and δ 15 N values seemed not to be influenced by hardpans and showed no consistent trends with soil compaction.

Discussion
The current study is the first to provide a representative picture of the depth gradients of C:N, POC:TOC, δ 13 C and δ 15 N in Germany's mineral agricultural soils. Based on the large number of soil samples (2,931 sites * average of 5.4 depth increments) and high data quality (all soil analyses in one laboratory), our data-mining approach revealed previously unseen patterns in TOC depth distribution and soil organic matter quality to 100 cm depth at regional scale. In the following, we discuss how these patterns can be used to trace different entry paths of organic C into German agricultural soils. We differentiate between: & autochthonous C, which covers aboveground and belowground C inputs from plants that grew on top of a given soil profile, & C input from organic fertiliser (animal excreta), and & parent material-inherent C, which comprises allochthonous C from plants that once grew somewhere else, e.g. upslope or upstream.
Also, we discuss the importance of vertical redistribution within soil profiles for explaining subsoil C, as well as texture and groundwater as major causes of variable C retention in German agricultural soils.

Origin of topsoil carbon
Most organic C entered the topsoil in the usual way, i.e. it originated from plants that grew on top of a given shown with error bars representing bootstrapped 95% confidence intervals. Different letters indicate significant differences (P < 0.05). Sample numbers are shown below each dot agricultural land, and not from somewhere else, and is therefore of autochthonous origin. The natural label of maize-derived C allows the fate of these autochthonous C inputs to be tracked. Our results suggest that each cropping year with maize replaced 0.3 to 1.3% of TOC in the upper 30 cm of cropland. Poeplau et al. (2020) estimate German cropland to store on average 61 Mg TOC ha − 1 in 0-30 cm (mineral soils only). Thus, autochthonous C contributed to the current soil organic C stock of German cropland at a mean rate of 0.3 to 1.3% of 61 Mg TOC ha − 1 year − 1 = 0.2 to 0.8 Mg C ha − 1 year − 1 . This estimate is about one order of magnitude lower than the C input of 3.2 Mg ha − 1 year − 1 , which was recently proposed by Jacobs et al. (2020). However, this number presented by Jacobs et al. (2020) refers to all plant residues and photosynthates that are not harvested, including stubble, mulch and other litter material > 2 mm. Soil organic C, however, includes only the organic C moieties < 2 mm. Plant residues > 2 mm have to be decomposed before entering the soil organic C pool. Decomposition is accompanied by respiration and organic C losses, which explains why less than 25% of the initial 3.2 Mg C input ha − 1 year − 1 that remains in the fields after harvest finally ends up in the fine soil. Once in the fine soil, organic C is likely to remain there for a mean period of 100 to 300 years before being mineralised and released back into the atmosphere. This mean residence time represents a rough approximation assuming steady state conditions and first-order decomposition rates (Gleixner et al. 2002)we did not perform radiocarbon dating. Uncertainties also remain from the use of single-pool mixing models; due to uncertain data in the long term we did not perform estimations using two-pool C concepts (Derrien and Amelung 2011). Nevertheless, our estimate is in good agreement with commonly reported radiocarbon ages of TOC in the ploughing layer (e.g. Flessa et al. 2008).
Our method of calculating organic C turnover differed from most previous studies and contains additional uncertainties. Our calculations are based on: (i) sites with crop rotations instead of continuous maize cultivation, (ii) unknown maize-derived C inputs dating back more than 10 years, and (iii) about 120 core sites instead of just one or a few sites. We believe that (iii) compensates for potential noise introduced by (i) and (ii). Once organic C has entered the fine soil, the vast majority resides there for > > 10 years. We did not observe a systematic pattern between the time period since the last maize crop and the δ 13 C value of soil (Fig. 6, green   dots). Hence, the exact timing of the last maize cropping in the past 10 years barely influenced organic matter turnover estimates in the range of 100 years and beyond. Unknown historic maize-derived C input was a potential source of bias in our turnover calculation. If the recorded share of maize in crop rotations within ten years before sampling correlated with historic maize use, we would have over-estimated TOC turnover. However, a pre-test suggested that this was not the case for the chosen reference period of ten years (cf. Fig. A3.1). Also, our estimate for C turnover (0.3 to 1.3% of TOC year − 1 ) is at the lower range of what has previously been observed in field trials and, therefore, an overestimation of organic C turnover is unlikely. Schiedung et al. (2017) reported much faster C turnover: here, after two years, maizederived C had substituted 7.4 ± 3.2% of TOC in the topsoil (0-30 cm) and 2.9 ± 1.7% of TOC in the subsoil (30-50 cm), yet in their study shredded maize stubble was left as mulch on the soil surface, probably enhancing maize-derived C input into the topsoil. Flessa et al. (2000) reported that 15% of TOC in 0-30 cm was maize-derived after 37 years of continuous maize cropping, which translates into a turnover of 0.4% of TOC year − 1 . Rasse et al. (2006) estimated topsoil organic C turnover to be 1.0% of TOC year − 1 for a Eutric Cambisol site in France. We therefore conclude that the simplified estimation of our turnover rate at 0.3 to 1.3% of TOC year − 1 is a reasonable magnitude for TOC turnover in the ploughing layer of German cropland.
In grassland, δ 15 N values were indicative of additional C and N inputs in the form of organic fertilisers. Elevated δ 15 N values in the topsoil of grassland with high livestock densities indicated a significant proportion of soil N to be derived from slurry or manure. Slurry-derived and manure-derived N tends to be enriched in 15 N because of isotopic discrimination during digestion and ammonia volatilisation during storage, resulting in δ 15 N values ranging from 6 to 13‰ (Kriszan et al. 2014). Each unit (500 kg) increase in livestock per hectare increased δ 15 N of grassland in 0-10 cm by on average 0.9‰. This slope is in perfect agreement with the one documented by Kriszan et al. (2014), who studied the effect of stocking rates on δ 15 N in the upper 5 cm of nine grassland sites in western Germany. Based on a positive correlation between δ 15 N in topsoil and overall farm N balances, Kriszan et al. (2014) proposed using δ 15 N as an indicator for N-use efficiencies at farm scale. Grassland of farms with positive N balances, i.e. greater N input than N export, showed topsoil δ 15 N values above 5. 4‰ (Kriszan et al. 2014). In the present study, half of all grassland sites were above this threshold value, indicating that about half of Germany's grassland shares a decadal history of N surpluses due to excessive organic fertilisation. Assuming that δ 15 N in grassland topsoil without slurry and manure fertilisation was 4.3‰ (intercept of linear regression line of δ 15 N vs. livestock), average δ 15 N of organic fertilizer was 10‰ and there was similar fractionation of soil N and amended slurry/manure N, then each livestock unit increased the share of organic fertiliser-derived N under grassland and 0-10 cm by 15 percentage points (95% CI, 8 to 22; Fig. 6). Considering that organic C and total N were highly correlated (grassland, 0-10 cm: R 2 = 0.86), the proportion of organic fertiliser-derived topsoil C can be assumed to be similar to the organic fertiliser-derived proportion of topsoil N. This estimate should be considered as a very approximate figure because divergence in both δ 15 N of N input and fractionation factors for δ 15 N in soil can be huge (Craine et al. 2015;Högberg 1997)much higher than for δ 13 C. Nonetheless, the effect of livestock on δ 15 N values in soil indicates that in addition to plantderived C input, manure-derived and slurry-derived C has also contributed significantly to topsoil C stocks in German grassland.
The Random Forest algorithm identified systematic patterns in TOC contents, C:N and POC:TOC ratios between sampling sites and depths. These patterns were indicative of land use and management-driven differences in the quantity and quality of C inputs and subsequent redistribution of these inputs. In 0-10 cm, land use was the most important predictor of TOC and grassland showed a TOC content 2.4 times greater than cropland. This might be explained by grassland typically receiving a greater total root-derived C input than cropland (Don et al. 2009). Root-derived C resides in soil longer than other types of C input (Rasse et al. 2005). This supports the hypothesis that higher TOC content in grassland compared with that in cropland is caused by different quantities of root-derived C input. Differences in TOC contents by land use were further amplified by tillage-induced redistribution of TOC under cropland. Most cropland in Germany is ploughed annually to an average depth of about 30 cm, which decreases TOC content in 0-10 cm and increases it in 10-30 cm. Thus, differences in the depth gradient of TOC in the upper 30 cm between grassland and cropland arose from both different C input and tillage-induced redistribution. Both processes together explain the large land use effects on the depth gradients of TOC contents, as well as of POC:TOC and δ 13 C and δ 15 N values (Fig. 2, right) tillage effects could not be separated from C input effects.
Interestingly, in 0-10 cm, Random Forest considered land use to be the most important variable for explaining TOC, but land use was irrelevant for explaining absolute C:N and POC:TOC ratios (Fig. 2, left). This was surprising considering the greater root-derived C input and much less frequent or absent tillage in grassland compared with cropland. A higher TOC content at equal C:N and POC:TOC ratios in grassland than in cropland can only be explained if both POC and MOC were equally elevated under grassland, indicating that the amount of MOC in the topsoil is governed not only by texture but by input as well: under steady state conditions, topsoil MOC content thus also increases with increasing C input (Rehbein et al. 2015).
In addition to recent land use and management effects on topsoil TOC, our study confirmed the legacy effects of historic heathland and peatland cover on organic C quality and quantity unique to the north-west part of Germany . Here, the maritime climate with relatively large amounts of rain in combination with forest clearances and subsequent spread of heathland favoured the podzolisation of old drift sands (Behre 2008;Schmidt and Roeschmann 2014). Ironpans that developed during podzolisation hindered water drainage and favoured the spread of peatland on a large scale. In the last few centuries, most of the peatland has been drained, the peat harvested, ironpans broken and former heathland and peatland converted to agricultural land (Schneider et al. 2017). Today, the affected soils still have an elevated TOC content (Springob and Kirchmann 2002), C:N ratios > 13  and POC:TOC values > 30% . Detailed information about the extent of the historic heathland and peatland is limited and therefore could not be included as a covariate in this study. However, the region in which historic heathland and peatland have been found is characterised by a distinct climate (considerable rain, limited sunshine, mild winters), soil region (Old Drift), parent material (drift sand) and the pedo-climatic zone (Weser-Ems-Geest). Therefore, these variables informed the Random Forest models about historic heathland and peatland cover. Direct climate effects on recent C inputs or C turnover could not be inferred from the present study. Also, the contribution of black carbon to TOC could not be elucidated in the present study because no proxy for black carbon has been analysed.

Origin of subsoil carbon
Below 30 cm depth, the 13 C label of maize and the 15 N label of organic fertiliser could no longer be detected, which confirms at national scale what many studies have previously shown at field scale: subsoil C and N are much less responsive to management-induced differences in C and N inputs than topsoil C and N (Poeplau and Don 2013). Subsoil C must therefore be older than topsoil C and exchanged with atmospheric C on a much longer timescale, millennial rather than centennial (Balesdent et al. 2018). In the Random Forest models, information on land use and management, which was recorded in the German Agricultural Soil Inventory for 10 years preceding sampling, was not relevant for explaining bulk soil TOC content below 30 cm depth. The only indication of significant exchange between subsoil C and the atmosphere was offered by the depth gradients of TOC, C:N and POC:TOC in soil profiles with hardpans (Fig. 8, Fig.  A3.4). Below hardpans, TOC contents as well as C:N and POC:TOC ratios were relatively low, suggesting that hardpans significantly restricted root-derived litter input into subsoils. This confirms the existence of root-derived C inputs to the subsoil. However the question remains of how much deep roots and their rhizodeposits contribute to the TOC stocks of subsoils. If organic C were solely root-derived and the input of root-derived C were only a function of depth, it should be possible to predict the organic C quantity and quality along the soil profile from the organic C quantity and quality of the topsoils. However, the importance of topsoil C in 0-10 cm for explaining subsoil C decreased with increasing depth (Fig. A3.

3) because:
& C input is not only a function of depth but also of root-restricting soil properties. & the contribution of translocated C or parent materialinherited C increases with increasing depth, and/or. & with increasing depth, texture-driven differences in the residence time of C increasesubsoil C is governed more by clay content than by rootderived C input.
This last point of texture-driven C retention has a huge effect on TOC storage, especially in the subsoil.
Deep in the soil profile, C input is typically low and C stabilisation governs how much C is being stored. Given two subsoils that only differ in C input, the subsoil receiving the higher C input will also show the larger TOC content. However, if a third subsoil not only differs in C input but also in soil texture, interpreting its TOC content is not straightforward because texture is key to C retention. This makes separating the effects of C input and C stabilisation challenging at sites with different textures. At a regional scale, however, as in German agricultural soils, texture tends to vary much more between sites than with soil depth 2 , therefore texture-driven differences in C retention can be overcome by evaluating depth gradients of TOC instead of independent bulk soil values. Depth gradients allow inferences to be drawn about the depth distribution of organic C inputs even at sites with contrasting soil textures.
In the present study, we used a Random Forest algorithm to elucidate patterns behind the depth gradients of organic C in Germany's agricultural soils. To explain the depth gradients of TOC contents as well as C:N and POC:TOC ratios, the algorithm ranked variables identifying those variables describing soil parent materials (chronostratigraphy), lateral soil transport (soil horizon symbol "M"), topsoil burial (relic topsoil) and vertical translocation of C (soil groups 3 : Anthrosols, Chernozems and Podzols) as particularly important. About 35% of agricultural soils developed from relatively young, holocenic deposits, most of which were either of fluviomarine or colluvic origin. Such alluvial and colluvial sediments typically comprise large amounts of allochthonous C from upslope or upstream areas , Mayer et al. 2019) and can bury significant amounts of autochthonous C at their depositional sites (Chaopricha and Marín-Spiotta 2014). In view of elevated subsoil C contents and only minor decreases of TOC in soil profiles from holocenic deposits (Fig. 5), we hypothesise a significant proportion of allochthonous C and buried autochthonous C to be still preserved in subsoils. Floodplains and wide valley bottoms are widely acknowledged to act as major C sinks in the landscape (Sutfin et al. 2016). Hoffmann et al. (2009) estimated the non-alpine part of the Rhine basin to have accumulated about 1 Pg of organic C from upstream and upslope areas since the beginning of the floodplain deposition, which translates to a mean Holocene sequestration rate of roughly 50 kg C ha − 1 year − 1 . Recent C sequestration rates in the Rhine basin are hard to quantify, but there is evidence that agricultural intensification has increased pre-human C sequestration rates of the Rhine basin by at least one order of magnitude due to stronger hillslope erosion (Hoffmann et al. 2013). This highlights the importance of lateral C fluxes in explaining soil organic C stocks, especially (i) when moving beyond pedon to landscape and regional scales , and (ii) for explaining subsoil C stocks due to their lower C turnover than topsoils. Organic C inherited from holocenic, i.e. relatively young, soil parent material appears to be a key contributor to subsoil C stocks in German agricultural soils.
Another major contributor to subsoil C was topsoil C translocated into the subsoil via anthropogenic soil profile modifications (e.g. deep ploughing). Numerous agricultural soil profiles have been modified by farmers far beyond annual tillage depths with the goal of site melioration. Such Anthrosols showed much higher TOC contents, and wider C:N and POC:TOC ratios in subsoils than most other soil groups. The depth gradients of these parameters were very noisy in Anthrosols (Fig. 4) but these depth gradients were also less steep on average than in other soil groups (Fig. 3a). The noise in the depth gradients of Anthrosols illustrates the diversity of their underlying formation. Anthrosols comprise a large variety of soils that have all been modified profoundly by human activity, including deep-ploughed soil, mining overburden, landfills and plaggen soil. Most deep soil profile modifications have accidentally translocated and buried great amounts of topsoil C in deeper soil layers. Previous studies have shown that such topsoil burial increases the residence time of C and thus soil organic C storage (Alcántara et al. 2016). Anthrosols mostly occurred either in the Old Drift landscape of north-west Germany or in viticultural areas along the Rhine and Mosel rivers, but they covered a significant proportion (8%) of agricultural land in Germany, making anthropogenic soil profile modifications an important feature for explaining the variability in subsoil C stocks and organic C depth gradients at national scale.
Furthermore, soil biota seems to have transferred great amounts of topsoil C into the subsoil. In the German soil classification (AD-HOC- AG Boden 2005), epipedon with strong bioturbation is coded with the soil horizon symbol "Ax". On average, Axhorizons reached a depth of 76 cm (95% CI, 74 to 80). Most "Ax"-horizons occurred in Chernozems, which interestingly showed relatively high TOC but relatively low POC:TOC values in subsoils compared with other soil groups (Fig. 3a). This could be explained by the activity of anecic earthworms, which might not only bury topsoil C but also convert particulate organic matter to mineral-associated organic matter and thus decrease POC:TOC ratios (Vidal et al. 2019).
Finally, some topsoil C was translocated to subsoils during podzolisation. About 3.5% of all sampled sites were Podzols. Most of them developed from pleistocenic drift sands in north-west Germany. Podzols showed much higher C:N but lower POC:TOC ratios in subsoil compared with topsoil, indicating that instead of POC, dissolved organic C species with a high C:N ratio were illuviated in the subsoil of Podzols (Sauer et al. 2007). The great importance of the "Weser-Ems-Geest" pedoclimatic region and the "Old Drift" soil region in explaining the depth gradients of C:N and POC:TOC points to considerable amounts of refractory C from historic heathland and peatland cover in soil profiles in this region. However, many agricultural soils in the "Weser-Ems-Geest" area were also subject to deep soil profile modifications (Anthrosols) or podzolisation (Podzols), which makes it difficult to disentangle the individual processes to explain elevated subsoil C in this area.
Effects of texture and groundwater on organic C The close correlation between organic C content and soil texture and with variables characterising oxygen availability (depth of groundwater table, Gr-horizons) underlines their influence on the retention and turnover of soil organic matter. Clay increases the C-storage capacity of soils because it offers a large, charged specific surface area for sorption of organic C (Hassink 1997;von Luetzow et al. 2008). Mineral-associated organic C mostly consists of organic matter with relatively low C:N ratios (Jilling et al. 2018). The observed decrease in C:N and POC:TOC ratios with increasing clay content (Fig. 2) can therefore be attributed to the preferential retention and microbial conversion of organic C on mineral surfaces. Sorption of organic C on clay surfaces decreases C turnover (von Luetzow et al. 2008), whereas microbes allocated on mineral surfaces directly incorporate metabolised C into their biomass and necromass (Kögel-Knabner and Amelung 2014). This makes soil texture a key variable for explaining the C stocks of agricultural soils in Germany (Vos et al. 2019).
Groundwater can decrease C turnover because it restricts oxygen availability and therefore biological activity (Blazejewski et al. 2005;Marin-Spiotta et al. 2011). About 11% of all examined sites show Gr-horizons, i.e. reducing conditions in at least 9 out of 12 months (AD-HOC- AG Boden 2005). These horizons had much lower δ 13 C and δ 15 N values than those of other soil horizons with higher oxygen availability at similar depths. Since microbial turnover constantly enriches 13 C and 15 N in the metabolites (Boutton 1996), low δ 13 C and δ 15 N values in water-saturated Gr-horizons may indicate retarded biological decomposition (Natelhoffer and Fry 1988).
In soil with a similar C input, different δ 13 C values of bulk soil tended to be associated with different POC:TOC ratios. In Gr-horizons, however, δ 13 C and also δ 15 N were low, while POC:TOC ratios were average. Instead of retarded conversion of POC to MOC and associated changes in POC:TOC, it could be retarded MOC turnover that keeps δ 13 C and δ 15 N values in Gr-horizons relatively lowin other horizons, MOC turnover could lead to an enrichment of heavy isotopes, particularly on the timescale of pedogenesis. Alternatively, the low δ 13 C and δ 15 N values observed in hydric soil could be explained by compound-specific organic matter turnover (Gurwick et al. 2008). Under anaerobic conditions, 13 C-enriched carbohydrates might be decomposed at a higher rate, resulting in the selective preservation of 13 C-depleted lignin compounds (Benner et al. 1987;Spiker and Hatcher 1987). Finally, it should be borne in mind that some Gleysols developed from C-rich holocenic sediments. The degree to which allochthonous C from aquatic sources, which also has light isotope signatures (Finlay and Kendall 2007;Laskov et al. 2002), contributes to the TOC content and thus to δ 13 C and δ 15 N values in the Grhorizons warrants further clarification. It is noteworthy that groundwater had a much stronger effect on bulk soil values of δ 13 C than on its depth gradient. This was because groundwater-affected soil showed slightly more negative δ 13 C values than sites without groundwater influence, even in topsoil. Generally, groundwater tables were too low to affect C decomposition even in the topsoil. Instead, different source signals probably contributed to the observed differences in the δ 13 C values of topsoils. Plants growing at sites without groundwater influence might experience more frequent and severe drought stress than plants growing at sites with high groundwater levels. Under water stress, C 3 plants tend to get enriched in 13 C by up to 1‰ as a result of a decreased stomatal aperture (Boutton 1996). Thus, water stress-induced 13 C enrichment of C 3 photosynthates might explain the observed differences in δ 13 C values of topsoils (Fig. 7). Together, the two processes of decreased decomposition of subsoil C at sites with high groundwater and 13 C-enriched-C input, especially in the topsoil at sites without groundwater influence, might explain why groundwater affected δ 13 C values in both subsoil and topsoil in the same direction.
Limits of Random Forest algorithms in explaining soil organic C The present study complements a series of previous studies that have used Random Forest to gain a better understanding about soil organic C dynamics (Padarian et al. 2020). Generally, such studies follow the same principle: First, a Random Forest model is trained to describe soil organic C with a set of explanatory variables (training data). Then, variable importance is evaluated by measuring the increase in model error after permuting (i.e. deleting the information of) each explanatory variable one at a time (Molnar 2019). Using this method, for example, Hobley et al. (2015) found climate variables to be key for explaining soil organic C storage in Eastern Australia. Also, in southeast Germany, Wiesmeier et al. (2012) identified precipitation and temperature to be important for explaining soil organic C. However, for whole Germany, our study and Vos et al. (2019) found climate to be insignificant for explaining soil organic C. Such seemingly contrasting findings can be confusing -they illustrate three major shortcomings of Random Forests in explaining soil organic C: & In contrast to mechanistic models, Random Forest models can only describe patterns within the predictor space they were trained on (Meyer & Pebesma 2020). Random Forest models that are trained to predict organic C in mineral soil (as in this study) will fail to predict organic C in organic soil because, per definition, the organic C content in organic soil is much higher than in mineral soil. This is merely because of the spread of soil organic C in the training data, not because of the contrasting processes governing organic C storage in mineral and organic soils. The latter is unknown to the model. & The permuation importance of an explanatory variable depends on (i) its variance, and (ii) on the other explanatory variables included in the training data.
If there is little variance in an explanatory variable, it will only be of minor importance for the model. For example, climate can only become important for explaining differences in soil organic C in the presence of significant longitudinal or altitudinal gradients in the target area (Adhikari et al. 2020). This might explain why in Australia (significant climate gradient), Random Forest identified climate as important for explaining soil organic C whereas in Germany (insignificant climate gradient) this was not the case. Furthermore, variable importance depends on the composition of the training dataset. For example, in the present study, if the Random Forest models had remained uniformed about soil groups or soil horizons, the models would have associated elevated soil organic C content in Chernozems and M-horizons with surrogate variables, e.g. soil texture. Selecting only meaningful explanatory variables, i.e. variables that are indicative for potential drivers of soil organic C, and having a wide coverage of these potential drivers in the training data facilitates the interpretability of Random Forest models. & Random Forests can only reveal associations, not causality (Wadoux et al. 2020).
However, if these limitations are kept in mind, unravelling associations with Random Forest models can spark hypotheses for subsequent mechanistic studies (cf. Kell and Oliver 2004). In conclusion, Random Forest models provide a valuable tool to shed light on systems whose complexity is not yet fully understood, such as soil organic C dynamics at regional scales.
Loosening hardpans is climate smart A significant bend in the depth gradients of TOC contents, C:N and POC:TOC ratios associated with hardpans suggests restricted deep root-derived C inputs in and below hardpans. In the case of TOC, the same trend was found even after mass correction to account for site-specific rock contents and bulk densities by evaluating depth gradients of TOC density, i.e. kg TOC ha − 1 at depth i relative to 0-10 cm (Fig. A3.4). These results confirm that loosening compacted soil layers has the potential for additional C storage in subsoils because loosening of compacted soil can facilitate deep rootderived C inputs. If loosening of compacted soils could achieve the same depth gradients of TOC density as that observed in non-compacted soil profiles, this could increase C stocks in 30-100 cm by 2.3 t C ha − 1 on average. However, the site-specific C-sequestration potential of soil loosening depends on the magnitude and depth of compacted soil layers. At sites with severely compacted soil layers starting already at 30 cm depth, loosening of soils could potentially accumulate up to 9 t ha − 1 additional organic C in 30-100 cm, while at sites with medium compaction only at 70-100 cm depth, loosing could increase subsoil organic C stocks by about 1 t ha − 1 only. If all compacted soil layers in 30-100 cm depth were loosened so that the depth gradients of TOC density resembled those of non-compacted soil profiles, German cropland could store 0.03 Pg (4%) more organic C in total in 30-100 cm depth. This corresponds to 119,874 thousand tonnes of carbon dioxide equivalents or 1.8 times the annual greenhouse-gas emissions of Germany's entire agricultural sector (UBA 2019). Considering that clay content increased with increasing level of compactness and that clay correlated slightly positively with the depth gradient of TOC, the carbon dioxide sequestration potential of cropland could even be greater. However, from what is known about topsoils (Poeplau et al. 2011), it is likely that subsoil C stocks would need at least a few decades to reach their new C equilibrium after soil loosening. Furthermore, deep soil loosening is not a trivial measure and might have to be repeated after a few years (Schneider et al. 2017). Biological soil loosening with biopore forming, tap-rooted pioneer crops could provide a viable alternative to mechanical deep loosening methods (Kautz 2015). It is generally easier to loosen shallow hardpans than those at greater depths. Loosening shallow hardpans also has greater potential for C sequestration and is thus most promising for climate mitigation.

Conclusions
Fresh photosynthates feed the long-term organic C stock of German cropland in 0-30 cm depth at a mean rate of 0.2 to 0.8 Mg C ha − 1 year − 1 . Organic fertiliser is another important source of C input, especially in grassland. In parts of north-west Germany, sandy soils still contain elevated amounts of historic C from past heathland and peatland cover.
In subsoils, holocenic parent material harbours large amounts of allochthonous C from upstream or upslope areas. This parent material-inherited C is a major contributor to subsoil C, especially in soils that developed from alluvial or colluvial deposits. Furthermore, vertical displacement of topsoil C along the soil profile contributed significantly to subsoil C storage in German agricultural land, especially in Anthrosols (topsoil burial with deep ploughing), Chernozems (bioturbation) and Podzols (illuviation). Overall, the origin of parent material and related pedogenesis were key drivers of subsoil TOC storage. The only evidence of direct root-derived C input into subsoils was a significant bend in the depth profile of TOC contents as well as C:N and POC:TOC ratios in compacted soil layers, suggesting that hardpans restrict root-derived C inputs into subsoils. The sustained loosening of hardpans by means of mechanical or biological melioration could increase deep rootderived C inputs and thus result in a significant increase in subsoil C stocks, enhancing soil fertility and climate mitigation.