Investigating soil carbon diversity by combining the MAXimum ENTropy principle with the Q model

Soil carbon diversity can be an important property for the stability of soil carbon. A problem is the lack of techniques for measuring this diversity. I suggest here the use of a combination of a general statistical principle, MAXimum ENTropy (MaxEnt), and a mechanistic model of organic matter decomposition, the Q model. The Q model provides the temporal development of the average carbon quality of litter and amount of soil organic C, which can be applied in a MaxEnt calculation to obtain a distribution of soil C over qualities. This distribution may not be the actual distribution but it is the most probable one. This distribution can be used to calculate aggregate properties for the total of soil C. I will use this distribution to calculate the temporal development of the variance in C quality as an expression of C diversity. The general tendency is that the variance declines with time of decomposition. Six long-term bare fallow (LTBF) from different climatic and management conditions were used to investigate which system properties are most important for the temporal development of the variance. The initial quality of the litter forming soil C is the dominant property. Chemical shifts in NMR spectra were tested as a possible way of measuring the variance in C quality.


Introduction
Soil carbon is a major part of the global carbon cycle. In spite of that, soil carbon is usually only described by its quantity and other aspects important for its dynamic behaviour are neglected. A reason for this lack of additional characterization can be that systematic quantification of other properties requires some theoretical framework to make these properties meaningful. One important property of soil C is the ease with which it decomposers can use it as food source; I will call this property C quality. Equal amounts of C but with different qualities decompose at different rates with higher qualities decomposing faster. The Q model Bosatta 1996, 1998) offers such a theoretical framework. Infrared spectroscopic methods (Joffre et al. 2001;Demyan et al. 2020) and NMR (Baldock et al. 1997) provide some techniques for measuring soil C quality but have not been used to describe the distribution of qualities.
In this paper, I will take the analysis a step further but looking at not only the quantity and the average quality of soil C but also at how the soil C is distributed over qualities (molecular diversity) by calculating the variance in quality. This is done by utilizing the Maximum entropy (MAXENT) concept (Harte 2011), which is applied for finding the distribution over C qualities with the least bias, given a set of constraints, in this case the average C quality. This concept has its origin in physics, where it is used to calculate the distributions having the maximum entropy. Lehmann et al. (2020) argue that functional diversity including molecular diversity is an important factor determining the stability of SOM against decomposition. One particular aspect is if different carbon qualities are differentially sensitive to a temperature change. Recalcitrant substrates are likely to have higher energies of activation. This is formalised in the carbon quality-temperature hypothesis (Bosatta and Å gren 1999), which suggests that decomposition of recalcitrant carbon (low quality) respond more strongly to a temperature change than labile carbon (high quality). If this hypothesis holds (e.g. Fierer et al. 2005), it might be necessary to know the distribution of carbon over qualities for accurately predicting the effects on the global carbon cycle from a global temperature increase. This is an instance where knowing the variance of soil C quality would be useful.

The MaxEnt principle
The MaxEnt is a general principal for finding the least biased probability distribution, U(q,t), of C qualities, q, with time, t, consistent with existing knowledge. It has its origins in statistical physics, where it has been used to calculate how systems are distributed over microstates. In my context the microstates would correspond to qualities of carbon. Its use has been expanded to a general statistical principle (information entropy) as a measure of missing information (i.e. uncertainty) (see e.g Dewar (2003) and Harte (2011) for more detailed accounts). In this case, our knowledge is the amount of soil C and its average quality, qðtÞ, as calculated from the Q model. Formally, MaxEnt amounts to finding the distribution Uðq; tÞ that maximizes the entropy function where the integral is over the range of qualities (q 0 will be defined in the section on the Q model).
The existing knowledge is introduced through the use of two Lagrangian multipliers [k 0 (t) coupled to the normalisation of the distribution (the integral over the probability distribution must be 1) and k 1 (t) coupled to the average quality], modifying (1) to The distribution that maximizes (gives the most probable) this expression is (see Supplementary information for a derivation) ln Uðq; tÞ ¼ À1 À k 0 t ð Þ À k 1 ðtÞq ð3aÞ or Uðq; tÞ ¼ e À1Àk 0 Àk 1 q ð3bÞ The Lagrangian multipliers (k 0 (t) and k 1 (t)) are determined by requiring that the constraints are satisfied and qðtÞ ¼ Combining Eqs. (3b) and (4) gives The derived MaxEnt distribution can then be used to predict additional properties of the system. Here I will focus on the variance of the distribution where q and k 1 are functions of time, t.
The Q model I will use the Q model to calculate the average soil quality q and use this in eq. (4) to obtain k 1 giving the MaxEnt defined probability distribution and its variance for six Long Term Bare Fallow (LTBF) sites (each site with its specific parameters, Table 1). The Q model is based on the idea that soil C consists of a mixture of carbon molecules defined by a distribution, q C ðq; tÞ of different qualities, q, expressing how easily the decomposer community can assimilate them. The rate of microbial C assimilation I will use is where u(q) is the rate of assimilation per unit of soil C of quality q. u 0 is a basic rate (here unit year -1 ) including climatic effects and b a shape factor. Part of the assimilated carbon is lost in respiration and a part goes into decomposer biomass. The fraction going into decomposer biomass (carbon use efficiency, CUE) is a parameter denoted e 0 (for simplicity assumed to be constant but site specific). Decomposers die and their necromass becomes part of soil C. However, the composition of the necromass has a different quality composition than the assimilated C. I describe this change in quality with a dispersion function Dðq; q 0 Þ defining the fraction of carbon assimilated at quality q' that is returned to the soil with quality q in the necromass. The parameterisation of D will be such that decomposers always produce carbon of lower quality than what is assimilated. A consequence is that the initial quality, q 0 , will also be the highest quality in the quality spectrum. The lower limit for quality is 0, which corresponds to an indecomposable substrate, decomposition could be stopped at some non-zero quality but that would require introduction of additional parameters. This gives the following mass balance equation The values for I 0 are calculated to match the assumed steady state soil C stocks at the start of the LTBF experiments where q 0 is the quality of the fresh litter input and I 0 is the rate of litter input. Using some simplifying assumptions (Å gren and Bosatta 1998), including replacing the dispersion function with its average shift in quality and setting g 11 constant, but site specific, gives the following expressions describing the increase in soil C with time, starting from a bare soil with a given constant rate of C input, I 0 . The parameter g 11 describes the average shift in quality of a carbon atom as it is assimilated from SOC by decomposers and than returned to SOC in necromass.
and change in average C quality qðtÞ ¼ 1 À e 0 À g 11 e 0 b 1 À e 0 À g 11 e 0 ðb À 1Þ where qðtÞ is the average quality of soil C and q t is the quality of a single litter cohort of age t. For more details on the derivations, see (Å gren and Bosatta 1998) and alternative exact solutions (Bosatta and Å gren 2003). Corresponding expressions can be derived for the development of changes in amounts of C and quality under bare fallow (Hyvönen et al. 1998). I will use parameters from Menichetti et al. (2019) to illustrate effects on the distribution by different environmental conditions. They analysed six different long-term bare fallow experiments (LTBF) and estimated all the parameters for the Q model. These six experiments (Askov, Denmark, Grignon, France, Kursk, Russia, Rothamsted Bare Fallow, UK, Ultuna, Sweden, Versailles, France) include a wide range of environmental and management conditions and should cover the ranges over which this analysis should apply and provide a reasonable parameter space. The parameters estimated for the Q model are given in Table 1 and more details about the experimental sites can be found in Barré et al. (2010).
To my knowledge, there exist no experimental studies of the variance in quality. I will use 13 C-NMR studies of chemical shifts to estimate a proxy. The magnitude of the chemical shift (c i ) identifies the chemical group i a C atom is part of and the magnitude of the signal (a i ) the number of C atoms of this kind (see e.g. Kinchesh et al. 1995 for details). The chemical shifts have not been directly coupled to quality but Sjöberg et al. (2000) showed that NMR chemical shifts correlated with respiration of SOM under laboratory conditions and Mazzolini (2012) and Incerti et al. (2016) used NMR chemical shifts to build models of SOM turnover. Changes in the relative contributions, as observed in the amplitudes of chemical shifts should, therefore, reflect changes in the quality distribution. Although the magnitude of the chemical shifts is not the same as quality, the O-akyl/ Alkyl ratio in the NMR spectra could be useful as an estimate of the average carbon quality (Baldock et al. 1997), changes in the variance of the spectra should correspond to changes in the relative distribution of qualities and be comparable, at least, qualitatively to the variance calculated for quality. I have found two studies where I could calculate changes over time in the average chemical shift.
where a i is the area (amplitude) of the signal at chemical shift c i . And the variance in chemical shift The average chemical shift and the variance in chemical shift have probably in themselves no clear meaning. On the other hand, increases or decreases in the variance over time should indicate whether the chemical complexity of the sample is increasing or decreasing. Indeed, a sample where the chemical shift becomes concentrated to only one region would have zero variance and the more the chemical shift is spread out over different regions, the larger is the variance.
One is from the CIDET experiment (Preston et al. 2009). These data include ten foliar litters and wood blocks at four sites followed for six years. The four sites were chosen to represent different climatic conditions and different litter qualities. The other study is by Sjöberg et al. (2000), who followed decomposition of humus layer materials from two Swedish forest experiments (Jädraås, Pinus sylvestris and Skogaby, Picea abies). At both sites material from control plots (C) and N fertilised (N) were used. For details on the experiments, see the original publications.
All calculations were performed with Mathcad 15.0 (Parametric Technology Corporation, Needham, Mass., USA). Graphs are produced with SigmaPlot 14.0.

Results
As eq. 6 shows, not including the average quality as a constraint (k 1 = 0) leads to a uniform distribution between 0 and q 0 , while adding average quality as a constraint, eq. 2, yields a distribution exponentially increasing or decreasing (eq. 6) with quality depending on whether k 1 \ 0 or [ 0. For all the six LTBF sites, k 1 (t)\0 showing that the quality distribution is shifted towards high qualities, Fig. 1. For all six sites and all times k 1 is numerically small and decreases over time, Fig. 2, suggesting that the C quality distribution over time becomes more and more close to a uniform distribution (k 1 = 0).
The estimated development of the variance in quality of a single litter cohort during 1000 years of decomposition in the six LTBF sites is shown in Fig. 3. In five of the six sites the variance decreases as decomposition proceeds although the decrease is hardly noticeable in a time span of 1000 years. Five of the sites develop very similar, while one site, Rothamsted, sticks out by not only having a much larger variance but also a variance that increases with time. The decline in variance with time becomes more visible when we consider the development of a soil over 2000 years, where a constant rate of litter inputs is applied during 1000 years and then interrupted when the soil is assumed to have reached steady state and then left to develop as a bare fallow, Fig. 2. The decline slows down as the system approaches a steady state (time 0 in the figure) to accelerate when the system is turned into a bare fallow. A time of 1000 years is not enough to reach steady state, why there is a jump in the curve at time 0. Again, Rothamsted sticks out by developing in the opposite direction of the other five sites.
Which site properties are responsible for the differences in the variance? I have tested steady state C store and average C quality against the variance, but neither shows any relation, Fig. 4. I have also calculated the sensitivity to a parameter, p, in the variance at steady state. The sensitivity, s, is defined as I have chosen Dp=p ¼ AE0:01, where both a small positive and a small negative variation in p have been used to see if the sensitivity is symmetric. With this Fig. 1 Distribution of C over qualities U(q) at the start of the six LTBF sites. The ranges of the distributions are from 0 to q 0 . The lines are from the top:, Ultuna, Askov, Grignon, Kursk, Versailles, Rothamsted definition of sensitivity, a sensitivity of 1 means that the variance changes in the same proportion as the parameter and a positive value that the change is in the same direction as the change in the parameters. The sensitivities to the parameters for the different sites are given Table 2.
Most sensitivities are \ 1 and negative, suggesting that sensitivities are small. Indeed, the sensitivity to u 0 is zero. On the other hand, changes, independent of sign, in the quality of litter forming SOM, q 0 , always leads to large changes in the variance. Rothamsted is the site with the largest sensitivities, whereas the other sites have smaller and rather similar sensitivities.
The development of the variances of the chemical shifts in the two experiments are shown in Figs. 5 and 6. The developments in the variance of the chemical shifts for the litters are clear with increasing variances with increasing mass loss for all litters and sites. There are no obvious differences among the litter types or sites. The development of the variances in the humus materials differ between materials collected at the two sites. Materials from the Skogaby sites have variances that increase with incubation time, whereas Jädraås materials have initially high variances that decrease slightly with time. N treatment leads for both sites to somewhat smaller variances.

Discussion
The diversity of soil C should be the base for the functional diversity of soil organisms. However, it is a property that is difficult to both define and observe and no metric has been agreed upon to measure it. Here, I have taken a theoretical approach to analyse how C diversity changes as litters decompose and soil C changes. The theoretical approach, the MaxEnt principle, is not a mechanistic model but a probabilistic one providing the best description given existing knowledge; maybe we must be content with just finding the most probable distribution rather than the actual one. If additional information becomes available, these are easy to include by adding new constraints in eq. 2. The Q model is, on the other hand, a mechanistic model of C and quality development, but could for this purpose be replaced by any other model or empirical information.
A difficulty with this study is the lack of robust ways of testing the predictions. I suggest that the variances in chemical shifts measured by NMR and those calculated from the MaxEnt distribution can be compared qualitatively, but not quantitatively. Quantitative comparisons are not possible because the two variances are on different scales; the quality in the Q model ranges between 0 and 1 and the chemical shifts between 0 and 200. The MaxEnt variances are all, except from the Rothamsted site, decreasing over time while the NMR variances are increasing. I have no explanation for this different behaviour, but it can be a result of important constraints missing in the MaxEnt calculations. It could also be a result of the NMR observations extending over months whereas the  Negative changes in parameters give similar sensitivities but of opposite sign, except for q 0 . Sensitivities are defined as the relative change in the variance to a similar change in the parameter, Eq. 14 MaxEnt calculations cover decades and might miss finer details on shorter time scales. On the other hand, Nunan et al. (2015) compared soil microbial functional diversity profiles in four LTBF sites (three of which are included in this study) with their cultivated counterparts and concluded that communities in the LTBF soils were exposed to a less a less diverse range of substrates in agreement with my predictions of decreasing variance. The most important parameter for the MaxEnt variance is the quality of the litter entering the soil, q 0 , Table 2. We can understand the importance of this parameter, because increasing it means that carbon is distributed over a larger range of qualities and hence the variance should increase. Since k 1 is numerically small, the MaxEnt calculated distributions will be close to uniform distributions, and the variance of a uniform distribution increases as q 2 0 =12, the sensitivity to q 0 will be large and in particular for the Rothamsted site with its large q 0 , Table 1. The differences in variances among sites, Fig. 4, is largely a reflection of differences in q 0 .
Which are the biological processes driving the changes in C quality variance? In fresh litters, the high quality C should initially be consumed preferentially, which should narrow the C distribution and decrease the variance. Later, C compounds from microbial necromass will be important in the C distribution; Fig 5 Development of variances in chemical shifts as function of mass loss of litters. The sites are: MAR morgan arboretum, CBR rocky harbour, NH1 nelson house 1, and TER termundee. The litters are: gfh fescue grass, dpt aspen leaves, and csb black spruce needles Barré et al. (2018) found that microbial C can contribute more than 50% of soil C in four of the LTBF sites (Askov, Rothamsted, Versailles and Ultuna). It is unknown how this microbial C is distributed over the quality spectra and it can, therefore, both drive increases and decreases in the variance. At the same time, it is noteworthy how little the variance changes over time again emphasising the dominant effect of the initial litter quality. Wickings et al. (2012) studied changes in chemical complexity in relation to three hypotheses on the development of the chemical composition during decomposition of litters. Their three hypotheses were (i) the chemical convergence hypothesis (with decomposition substrates becoming more and more similar), (ii) the initial litter quality hypothesis (the initial litter quality controls the development of substrate composition over time), and (iii) the decomposer control hypothesis (the composition of the decomposer community determines the development of substrate composition). Their experiment pointed mostly in favour of hypotheses (ii) and (iii). My study of six different bare fallow experiments are not directly comparable to Wickings study as litters are not the same in the different sites, but the high sensitivity to initial quality (Table 2) should be a support for hypothesis (ii). The increasing variance over time is, on the other hand, in contrast to hypothesis (i) that should predict a decreasing variance. It is difficult to judge the validity of hypothesis (iii) as it not possible to separate effects of decomposer properties from other site differences in the bare fallow experiments. I should also be noted that although my calculations suggest a decreasing variance with time, it does not mean a convergence of in chemical composition as the C distributions will be centred at different qualities depending on substrate and decomposer community. Adding further constraints in eq. 2 is a development of this approach that should give more insights into what drives changes in the chemical complexity of soil c.
Funding No external funding was obtained for this work.
Code availability Mathcad code is available upon request to the author.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.