Introduction

Current and expected changes in climate and atmospheric CO2 concentration have prompted the study of the mechanisms used by vegetation to survive to changing environmental conditions. One of these mechanisms is the adjustment of their carbon allocation scheme (Lacointe 2000; Franklin et al. 2012; Xia et al. 2017). Carbon allocation is a concept that involves all carbon cycle–related processes that take place within vegetation such as the following: the assimilation of atmospheric CO2 via photosynthesis; the partitioning of the photosynthetic assimilates to vegetation parts (from here on compartments) for growth or for other processes such as maintenance, defence, and storage; and the time carbon atoms remain in the vegetation before being released back to the atmosphere (Lacointe 2000; Schimel et al. 2001; Trumbore 2006; Canadell et al. 2007; Franklin et al. 2012; Körner 2017). Although these processes take place in different spatial and temporal scales, they are interconnected.

Photosynthesis, for example, is a metabolic pathway that takes place in the chloroplast and requires light, CO2, and water. The amount of captured light available for photosynthesis depends on molecules (pigments) whose abundance is determined by the phenological state of the plant; e.g. deciduous plants have green leaves at the beginning of the growing season because they are rich in chlorophyll (pigment that can absorb blue and red lights); thus, when it is no longer produced and is broken down (in autumn), the leaves take the colors of the remaining pigments, which are not as efficient absorbing light. Ontogeny also plays an important role, because older plants have slower metabolism than younger ones (Hartmann et al. 2018). In addition, not all leaves in a plant are exposed to the same quantity and quality of light simultaneously, some may be shaded by other leaves of the same plant or competing plants, creating heterogeneity in photosynthetic rates. Thus, the amount of light available for photosynthesis at the ecosystem scale depends on the capacity of the canopy to intercept light, quantified by the leaf area index (LAI), i.e. the sum of all leaf layers in a canopy (Waring and Running 1998). For reviews on plant–environment interactions across scales refer to Waring and Running (1998), Reichstein et al. (2014), Hartmann and Trumbore (2016), and Hartmann et al. (2018).

It is then evident that the metabolic activities that occur at short spatial and temporal scales depend on precursors whose availability is determined by processes taking place at larger scales, such as competition for light, water, and other resources. Unfortunately, not all processes involved are understood at the same level of detail and, or spatio-temporal resolution, which creates opportunities for multiple interpretations and uncertainties for upscaling in models. This is problematic because if processes are addressed with a different level of detail, there is an unbalanced representation of reality (Waring and Running 1998), and this biased representation can lead to inconsistent predictions (Govingjee 2009). In fact, coupled climate-carbon cycle models predict divergent trajectories of future fluxes among Earth’s major carbon reservoirs, with no consensus on whether terrestrial ecosystems will act as a net source or sink of carbon with respect to the atmosphere (Friedlingstein et al. 2006; Friedlingstein et al. 2014).

The effort to reconcile diverse empirical findings and attain realistic predictions has prompted the use of a large number of functions and parameters in models (Quaiser et al. 2011). However, many parameter values are unknown because they are not necessarily measurable (Trumbore 2006), and increasing model complexity has obscured the identification of the main processes that control overall system behaviour. Perhaps this is why the quest to find the source of uncertainties has been limited to the comparison of numerical outputs of models (De Kauwe et al. 2014), and reviews of their conceptual design (Franklin et al. 2012; Walker et al. 2014; Merganičová et al. 2019; Trugman et al. 2019). Although these studies have inspired model classifications (e.g. based on the carbon allocation scheme: static or dynamicFootnote 1 (Litton et al. 2007)) that can indicate potential predictive uses of a model (until a certain extent), these conceptual classifications can be ambiguous.

We identified two main sources of ambiguity in the description of CA in models. The first source is that despite previous attempts to clarify terminology, the concept of carbon allocation is still being used to describe both, patterns and processes, which leads to an ambiguity originally exposed by Litton et al. (2007); carbon allocation is a general term that overarches patterns (biomass) and processes (flux and partitioning) according to Litton et al. (2007). As an example, the sentence the carbon allocated to the vegetation compartment X may have three different interpretations depending on the context: (i) the amount of carbon present in X at a certain time (biomass), (ii) the portion of photosynthetic input that goes to X (partitioning), and (iii) the flux of carbon into and out of X. The second source of ambiguity is that it is unclear if the term dynamic carbon allocation, which is commonly used in the modelling literature, implies dynamic functions for fluxes, or for partitioning of photosynthetic input. As we will show later, there are important consequences on carbon storage, cycling, and release due to the introduction of these assumptions. In addition, since the former classifications are only based on ‘descriptive rather than analytic’ assessments of models (Xia et al. 2013), they ignore the implications that their mathematical representations may have on predicted dynamics.

Evidently, the assessment of the mathematical representations of carbon allocation (model structure) is of major importance; while the parameter nonidentifiability and a wrong choice of modelling approach are undeniable causes of poor model performance, an important cause of error in models involves problems with model structure. Some characteristics of model structures can help to identify a priori certain model behaviours, e.g. the stability of the system modelled. It is important to identify whether a system is in steady state or not because that is a prerequisite for the calculation of certain model performance diagnostics (Sierra et al. 2016). The analysis of the model’s stability can also have a tremendous impact in the perception of its resilience, i.e. the maximum perturbation that can be taken without causing a shift to an alternative stable state.

There are other types of behaviours such as hysteresis loops, oscillations, and multiple steady states (Strogatz 2014), which have important consequences for the stability and the resilience of a system, and can be the result of nonlinear interactions. Nonlinearities occur when certain types of interactions between the parts of the modelled system (e.g. interference, cooperation, and competition) are symbolized mathematically with nonlinear terms such as products and powers (Strogatz 2014).

Overall, a better understanding of model structure can contribute to develop concrete criteria for selecting relevant modelling approaches and performance diagnostics. However, the comparison of the mathematical formulation of multiple models is impaired by the fact that they are encoded in different (often cryptic) programming languages, and the published model descriptions do not follow a standard such as the ODD (Overview, Design concepts, and Details) protocol (Grimm et al. 2006; Grimm et al. 2010). But perhaps the main caveat is that when equations are presented as lists or sets, the interactions between variables are not conspicuous. Thus, it is not always clear if there are dependencies between variables that can lead to structural nonidentifiabilities (Raue et al. 2009). This is why the general matrix model of the terrestrial carbon cycle (see Eq. 1) was a leap forward in the field (Luo et al. 2003; Luo et al. 2012). It can be expressed as

$$ \begin{array}{@{}rcl@{}} \frac{dX(t)}{dt} &=& A\xi(t)CX(t) + BU(t),\\ X(t=0) &=& X_{0} \end{array} $$
(1)

where X(t) is a vector of ecosystem carbon pools, ξ is a scalar representing effects of temperature and moisture on the carbon transfer among pools, A and C are matrices containing carbon transfer coefficients between plant, litter, and soil pools, B is a vector of partitioning coefficients of the photosynthetically fixed carbon to plant pools, U(t) is the photosynthetically fixed carbon, and X0 is a vector of initial values of the carbon pools (see also Fig. 1). Within this framework, the modelled terrestrial C storage capacity can be decomposed into a few traceable components that are built upon biogeochemical principles.

Fig. 1
figure 1

Matrix approach to carbon cycle modelling (Luo et al. 2003). Empirical observations inspire model development, and the models can be encoded using ordinary differential equations. These equations can be generalized into a matrix representation, which in turn can be used to evaluate how the C sinks vary with environmental forcing over time. X(t) is a vector of ecosystem carbon pools, ξ is a scalar representing effects of temperature and moisture on the carbon transfer among pools, A and C are carbon transfer coefficients between plant, litter, and soil pools, B is a vector of partitioning coefficients of the photosynthetically fixed carbon to plant pools, U(t) is the photosynthetically fixed carbon, and X0 is a vector of initial values of the carbon pools. Adapted from http://www.cesm.ucar.edu/events/wg-meetings/2017/presentations/bgcwg+lmwg/luo.pdf

Although this matrix equation has an important role mapping the mathematical formulations to the modelled biogeochemical processes, it has a few limitations: the partitioning coefficients (B) are fixed; and it does not allow for nonlinear interactions such as the case when photosynthetic rates depend on the storage of carbon in other compartments. This is problematic because complex systems such as terrestrial ecosystems are not necessarily linear (Luenberger and Hill 1979; Zobeley et al. 2005). In fact, certain nonlinearities have been revealed when there is competition for light in models that can accurately predict productivity and species composition (Purves and Pacala 2008). Thus, it is necessary to define a new framework that (a) has more flexibility to accommodate different types of inputs (e.g. C assimilation rate, Gross Primary Productivity—GPP, Net Primary Productivity—NPP); (b) allow for a dynamic partitioning of inputs or storage reserves; and (c) can accommodate nonlinearities among system components.

In this contribution, we studied the process of CA in vegetation at the ecosystems level, from a dynamical systems perspective. We addressed the following questions: (1) How can we express the carbon allocation concept within the context of dynamical systems? (2) Is there a mathematical framework that can accommodate nonlinearities? (3) Can we systematically identify common model properties and categorize the models accordingly? We reviewed published model descriptions and constructed a database with a selection of them. This database aided us during the analysis of the sets of equations that represent CA in vegetation. Based on this model synthesis, we were able to identify key characteristics from which we inferred the proposed conceptual framework and mathematical tool.

Methods

This study is a comprehensive analysis of the common mathematical expressions used to model CA in vegetation. We synthesized different mathematical expressions used to represent CA in models taking advantage of the mathematical concept of dynamical system. This concept is a general abstraction to describe a rule that determines the future behaviour of a set of variables based on their current state (Luenberger and Hill 1979). Thus, in a dynamical system, the patterns from the present are a result from the patterns in the past. These time linkages among variables can be represented using difference or differential equations depending on whether the dynamic behaviour occurs in discrete or continuous time, respectively. A difference equation relates the value y(k), at point k, to values at other points (e.g. y(k + 1) = ay(k),k = 0,1,2,...). A differential equation connects a function y(t) defined on an interval of continuous time and some of its derivatives. For further details on these concepts, please refer to Luenberger and Hill 1979, p. 27.

Working definition of carbon allocation

In order to avoid the ambiguities mentioned in the ‘Introduction’ section, we contextualized the CA concept within the more general theory of dynamical systems and modify the matrix representation from Eq. 1 as follows:

$$ \begin{array}{@{}rcl@{}} f = u \cdot \mathbf \upbeta + \boldsymbol B \cdot \mathbf x \end{array} $$
(2)

where f is a function valued vector that relates the changing quantities that take place between the vegetation compartments; u is a scalar function that represents the primary productivity—system input—and may be an assimilation rate, GPP, or NPP; β is a vector containing the partitioning coefficients of photosynthetic input; B is a matrix that contains the carbon transfer coefficients between plant, litter, and soil compartments; and x is a vector of states for vegetation (state variables). This is illustrated with an example of a two-compartment system (Fig. 2).

Fig. 2
figure 2

General mathematical representation of carbon allocation (CA) in a 2-compartment vegetation model. The colors facilitate the identification of the different modules; grey: entire CA process, blue: photosynthetic input, green: partitioning of the input, and orange: C transfers between and out of the compartments. The arrows point towards the direction of the movement of the compartment content. The letters above them symbolize the parameters that regulate this flux. Other symbols are x: vector of states for vegetation (state variables); \(\dot {\mathbf x} = d \mathbf x / d t\): changing quantity of vegetation compartment x with respect to time (state variables x1 and x2); f: function valued vector that relates the changing quantities that take place between the vegetation compartments; u: scalar function that represents the primary productivity—system input—and may be an assimilation rate, GPP, or NPP; β: vector containing the partitioning coefficients of photosynthetic input; B: carbon transfer coefficients between plant, litter, and soil compartments. The matrix representation was derived as described in Anderson (1983) and Sierra et al. (2012). The last equation shows the units that the different components may have, highlighting their participation as fluxes (f, u), coefficients -fractions- (β), and rates (B) in the model

Based on this new representation, we propose the following definition: Carbon allocation is the balance (net flux) of the C flows into, between, and out of the vegetation compartments, because the amount of carbon in a single compartment at a given time (t) is the balance between the C that has flown into and out of this compartment. This definition makes it possible to distinguish between the different cases constantly grouped as carbon allocation by assigning each case to its corresponding term in Eq. 2 as follows: (a) ‘the C in X’ refers to the biomass: x; (b) ‘the C partitioned to X’ is the fraction of the input that goes to that compartment, i.e. partitioning: β; and (c) ‘the carbon allocated to X’ is the result of the whole process, i.e. the net flux: right hand side of the Ordinary Differential Equation (ODE) 2.

This definition was the result of the review of several published model descriptions and is the basis of the mathematical framework that we propose and use to further analyze the main characteristics of the model descriptions (e.g. driving variablesFootnote 2, parameters2, nonlinearities). The organization of the equations within this framework also facilitates the identification of rates ([time− 1]), fractions ([], unitless), and fluxes ([mass ⋅time− 1]), just by observing their units. As means to automate the analysis of published model descriptions (PMDs) and promote the use of this framework to understand the representation of CA in models, we developed a model database and an accompanying Python 3 package called bgc-md.

Selection of published model descriptions

We reviewed PMDs of ecosystem models and extracted the equations and descriptions contained in their CA module. Our literature search was not exhaustive, but we obtained a representative sample of PMDs by focusing our selection to published reviews (Franklin et al. 2012; De Kauwe et al. 2014; Walker et al. 2014), and Web of Science. The search in Web of Science took place in February 2020, using the following keywords: models, simulations, environmental sciences, and carbon allocation; we found 167 published articles from which we selected only those that had a complete description of the module of carbon allocation in vegetation, where processes were represented using ordinary differential equations (ODEs) with a consistent use of symbols.

Protocol for analyzing the PMDs using the Python package bgc-md

The Python 3 package, bgc-md (https://github.com/MPIBGC-TEE/bgc-md) consists of a model database and additional code that (i) extracts the information from each database entry, (ii) generates reports, and (iii) performs computations such as model runs, calculation of jacobian and eigenvalues, and generates various plots for comparison. The entries in the database consist of text files written in YAML format (www.yaml.org). Each entry corresponds to a selected publication—PMD and is created as follows:

  1. 1.

    The first lines of an entry are the metadata, i.e. the information needed in order to identify the model, the author of the entry, the publication where it came from, and some model characteristics such as the partitioning scheme (fixed or dynamic) and whether it was claimed that the model prioritizes a compartment during partitioning, or not. The doi of the original publication is also included in order to automatically generate citations using the API from crossref (www.crossref.org).

  2. 2.

    After the metadata, there is a section called ‘model’, which is divided into subsections (e.g. state variables, photosynthetic parameters) that consist of a list of symbols that represent variables or parameters. If the function to calculate a variable is provided (e.g. NPP = GPP - R), the equation is written using symbolic mathematics according to the syntax of the python package SymPy (Meurer et al. 2017).

    The final component of the ‘model’ section is the subsection ‘components’. Here the sets of ODEs are factorized into the matrix form of Eq. 2 (see also Fig. 2).

  3. 3.

    The bgc-md package computes the equations in the matrix form and produces reports in html format. These reports summarize the PMDs and display the set of equations in a readable way. In order to allow a quick comparison of a list of models, the package can also generate a website with reports and graphs that can be navigated.

Code availability

All of the simulations and figures for this work can be reproduced using the code and data provided in the supplementary material .

Results

From a total of 182 reviewed publications, only a few of them (19) met the requirements of having a CA scheme described with a complete set of difference or differential equations (Table 1). Evidently, we did not include PMDs without equations (e.g. Chen and Reynolds 1997). Other types of publications excluded from our review were those concerned with aquatic vegetation (e.g. Soetaert et al. 2004), tree ring simulations (e.g. Eglin et al. 2010; Lazzarotto et al. 2009; Li et al. 2014), pipe models (e.g. Valentine 1999; Schiestl-Aalto et al. 2015; Falster et al. et al. 2017), annual plants (e.g. Fox 1992a), vegetation systems with two compartments or less (e.g. Huntingford et al. 2000), and models that represented the carbon allocation using allometric equations (e.g. Weng et al. 2019) or structural equation modelling (e.g. Hofhansl et al. 2014).

Table 1 Classification of the different models analyzed according to their mathematical structure

The vegetation systems described in the PMDs correspond to different hierarchies, ranging from individual plants (2 models), to forests (4 models), regions (3 models), up to the whole globe (10 models). In terms of time resolution, the processes described in the PMDs often take place at different scales; there can be daily simulations of photosynthesis, evapotranspiration, and soil water dynamics, and annual updates of vegetation structure and plant functional type population densities (Sitch et al. 2003). Another example of such a wide time-scale range was observed in the model Tethys-Chloris by Fatichi and Leuzinger (2013), in which the photosynthesis, energy, and water fluxes are resolved hourly, and the ‘carbon allocation’ and turnover take place at a daily time step. This means that the input (u) of these models is often portrayed as a faster process than the partitioning (β) and cycling (B).

Moreover, different modelling approaches (conceptual and mathematical) have been used to simulate CA as a whole, and the partitioning in particular. Table 2 summarizes the main groups in which the models can be classified according to these different approaches. Our contribution to this classification, which builds on the previous work of De Kauwe et al. (2014) and Walker et al. (2014), is a classification based on the mathematical formulation of CA (f(x,t)), derived from the fact that several models have been described using linear ODEs, nonlinear ODEs, or difference equations.

Table 2 Classification of the carbon allocation (CA) module of different models according to the main conceptual and mathematical approach used

Our literature search revealed that in 11 of the 19 PMDs selected, it was claimed that they had a ‘dynamic carbon allocation’ because their objective was to simulate the vegetation response to environmental cues (e.g. lack of water or light) via selective partitioning of newly photosynthesized carbon to a limiting compartment. However, three of these 11 had a fixed partitioning scheme (fixed coefficients) and the environmental scalars were only found in the photosynthesis or turnover modules (see Table 1). Within such a scheme, the changes in environmental conditions only affect the amount of C entering or leaving the system as a whole, i.e. the limiting compartment would not be favoured in the partitioning of C. This is an example of how alternative interpretations of the term ‘dynamic carbon allocation’ can lead to misleading or ambiguous model descriptions.

This supports the need to use the working definition of CA proposed in the methods to continue with our analysis, leaning on the biogeochemical relevance of the terms in Eq. 2 (photosynthetic input-u, partitioning-β, cycling-B, and stocks (state variables)-x). For this reason, we analyzed the components of each term instead of searching for connections between numerous equations spread across modules (e.g. stomata, physiology, canopy, carbon balance), which is a very common way to organize the equations in ecosystem models. Identifying these connections, and especially the type of interactions between variables and parameters is important because some dependencies between state variables can lead to nonlinearities.

In light of this working definition, we could find the characteristics that the 19 PMDs had in common, such as their driving variables and their interactions. We found that in 14 of the 19 PMDs, there is a lack of carbon exchange between the vegetation compartments. This characteristic was diagnosed by the bgc-md package, which automatically identified that their cycling matrix B is diagonal—the elements of the off diagonal are zeros (see Table 1). Moreover, the only cases in which the compartments exchange C is when there is a labile C compartment, e.g. the model DALEC (Williams et al. 2005).

Furthermore, we found two distinct ways to represent respiration in vegetation: (a) growth respiration is often portrayed as a percentage of the C influx, which is a calculation performed within u independently from the amount of C stored in the compartments; in contrast, (b) maintenance respiration is a flux that depends on the amount of C stored, because it is the product of the multiplication of state variables and cycling rates (Bx). These two representations have different implications. In case (a), respiration represents the cost of growing organs, which is the amount of photosynthetically fixed C that cannot be used to build new tissues. In case (b), respiration is the metabolic cost of maintaining the existent tissue, which depends on how much tissue there is.

In most PMDs, the environmental variables driving the models (e.g. temperature, water and nutrient content, light availability) were found in the equations that compute the photosynthetic input (u), but sometimes air temperature and atmospheric CO2 concentration were implicitly included in environmental scalars that modify cycling rates. In addition, the photosynthetic input generally does not depend directly on the C content of the leaves (one of the state variables), rather on the leaf area index (LAI), which is not expressed in terms of leaf C.

In contrast to the input dependency on environmental cues, the C transfers that shape the compartments via partitioning (β) of photosynthetic input and C cycling (B) are seldom driven by climate. In some cases, the partitioning is constrained by the ratio between some compartments (e.g. C foliage : C roots), but the most common constraint is the use of fixed coefficients that should sum up to 1 (see Table 1). One example is the model by King (1993), which was designed to analyze the influence of partitioning to root and foliage on forest production. This model has a structure in which the partitioning (β) and the cycling (B) contain parameters, instead of state variables or time-dependent variables. Thus, its partitioning scheme is independent of changes in the environment. In other schemes, the partitioning can switch between fixed and dynamic, depending on the sensitivity to changes in resource availability (e.g. the model CEVSA2 by Gu et al. (2010)).

Some PMDs have an intricate C cycling dynamic; for example, the model proposed by Hilbert and Reynolds (1991) has a structure inspired by the hypothesis of balanced root and shoot activity (Davidson 1969) and the response of specific shoot activity to resource availability. This model has five state variables: mass of leaf proteins, leaf structural components, roots, substrate carbon, and substrate nitrogen. The photosynthetic carbon is produced in the substrate compartments, and from there it is cycled to the other compartments. In cases like this, where all the photosynthetically fixed C is first stored in a compartment, there is no partitioning to multiple compartments, e.g. β = [1,0,0,0]. Conversely, the B matrix is highly nonlinear because the C cycling depends on the state variables. This is why we broadened the mathematical framework and propose a new one that can accommodate nonlinearities.

The new mathematical framework

Based on the three categories in Table 2 and the fact that Eq. 2 can have particular cases based on the dependencies of its terms, we propose a new mathematical framework to study CA representations in models (Fig.3). This new framework takes into account that the mathematical abstraction from Fig. 2 may vary among models, depending on how their variables and parameters interact with each other, i.e. whether they depend on each other or not. These dependencies also have informative ecological implications such as positive feedback loops in the scalar function of system input u, or sink-dependent fluxes of the cycling matrix B (see Figs. 2 and 3), as those represented with the nonlinear model in the next section. These dependencies ultimately determine whether a model is linear or nonlinear. Nonlinear models have fluxes that depend on interactions between the state variables (symbolized with products and powers). This entails that the models have nondiagonal compartmental matrices where the off-diagonal values of the matrix also have state variables because the C transfer rates depend on them. In these kind of models, the compartments exchange C, so it is also likely that they have at least one nonstructural C compartment that can act as an intermediary in the C exchange.

Fig. 3
figure 3

Model classification according to their mathematical formulation, where x(tn+ 1) is a vector of states for vegetation at time tn+ 1. f(x,g(tn)) represents the dependency of the function valued vector f on state variables (x) and time-dependent functions (g(tn)); it can also depend on state variables and time: f(x,t). The time step size is Δt = tn+ 1tn. The dependency of u in time and state variables is represented by u(x,t). β(t): partitioning function valued vector that represents the time dependency of β, which may also depend on vegetation compartments (allowing for nonlinearities): (β(x,t)). B(t) represents the time dependency of B

We classified the 19 PMDs according to their mathematical formulation—structure (see Table 1). The analysis behind this classification can be found on the reports generated for each of the PMDs using the bgc-md package (see the website). Notice that not all models have separated photosynthetic input u and partitioning β. Particularly, the models with a partitioning scheme that depend on phenology often formulate this dependency as a set of if statements or piecewise functions. Thus, the u and β fuse together into the vector of photosynthetic inputs to each compartment u, e.g. DALEC (Williams et al. 2005) and ISAM (El-Masri et al. 2013) (see Table 1). Another model with a partitioning scheme that depends on phenology according to its description is CASTANEA (Dufrene et al. 2005), but this publication could not be included in the database because of the lack of equations.

Implications of nonlinearities using an example of a vegetation system

Nonlinearities can have a major impact on the model dynamics, when compared with a linear model with the same number of compartments and connections among them. We can illustrate this using a couple of models represented with the following sets of ordinary differential equations:

  • A linear system:

    $$ \begin{array}{@{}rcl@{}} && \dot{C_{\text{NSC}}} = C_{\text{NSC}} \cdot (- \eta_{r} - \eta_{w} - \gamma_{f\text{NSC}}) + C_{f} \cdot \eta_{\text{NSC}} \end{array} $$
    (3)
    $$ \begin{array}{@{}rcl@{}} && \dot{C_{f}} = C_{\text{NSC}} \cdot \gamma_{f\text{NSC}} + C_{f} \cdot (- \eta_{\text{NSC}} - \gamma_{f}) + \eta_{f} \cdot k_{1} \end{array} $$
    (4)
    $$ \begin{array}{@{}rcl@{}} && \dot{C_{w}} = C_{\text{NSC}} \cdot \eta_{w} - C_{w} \cdot \gamma_{w} \end{array} $$
    (5)
    $$ \begin{array}{@{}rcl@{}} && \dot{C_{r}} = C_{\text{NSC}} \cdot \eta_{r} - C_{r} \cdot \gamma_{r} \end{array} $$
    (6)
  • And a nonlinear system:

    $$ \begin{array}{@{}rcl@{}} \dot{C_{\text{NSC}}} &=& C_{\text{NSC}} \cdot (- \eta_{r} - \eta_{w} - (v_{2}/(C_{f} + k_{2}))) + C_{f} \cdot \eta_{\text{NSC}}\\ \end{array} $$
    (7)
    $$ \begin{array}{@{}rcl@{}} \dot{C_{f}}& = &(C_{\text{NSC}} \cdot v_{2} /(C_{f} + k_{2})) + (C_{f} \cdot k_{1} \cdot v_{1} /(C_{\text{NSC}} + k_{1}))\\ &&+ C_{f} \cdot (-\eta_{\text{NSC}} - \gamma_{f}) \end{array} $$
    (8)
    $$ \begin{array}{@{}rcl@{}} \dot{C_{w}} &=& C_{\text{NSC}} \cdot \eta_{w} - C_{w} \cdot \gamma_{w} \end{array} $$
    (9)
    $$ \begin{array}{@{}rcl@{}} \dot{C_{r}} &=& C_{\text{NSC}} \cdot \eta_{r} - C_{r} \cdot \gamma_{r} \end{array} $$
    (10)

    where, Ċ is the changing amount of carbon in foliage (f), nonstructural carbohydrates (NSC), wood (w), and roots (r), with respect to time; k1, k2, v1, and v2 are Michaelis-Menten parameters; ηf is the flux rate of photosynthetically fixed C to foliage; ηNSC is the flux rate of C from foliage to NSC; ηw is the flux rate of C from NSC to wood; ηr is the flux rate of C from NSC to roots; γf,NSC is the flux rate of C from NSC to foliage; γf NSC is the flux rate of carbon from CNSC to foliage; γf,w,r is the rate at which C leaves each compartment (e.g. respiration, litter). The differences between them are highlighted in Table 3.

Table 3 Sets of equations of the models from the example

Both are autonomous systems (they are not driven by external variables such as temperature), where vegetation is divided into four compartments: nonstructural C (NSC), foliage, wood, and roots. The nonlinear model proposed here has a positive feedback loop that assures the increase in photosynthesis with an increase in foliage mass. This is represented with the nonlinear interaction between the foliage-dependent photosynthetic input and the dynamic partitioning: u(x,t) ⋅ b(x,t) (see Table 3).

These models were compared in terms of their stability, since this is one of the system properties that can be predicted from model structure. The calculations for this comparison can be found in thesupplementary material: Supplement.ipynb. We found the fixed points, which are points where the C input and output of each compartment cancel out; in other words, we computed the analytical solution of the systems in steady state. Since the nonstructural C (CNSC) was the only compartment that exchanged material with all the others, we followed the following procedure: First, we set the right-hand side of the ordinary differential equations for all variables except CNSC (Eqs. 810 for the nonlinear model) to 0. Then, we solved them for each state variable so that the solutions were in terms of CNSC. This reduced the dimensionality of our system from four to one dimensions. We replaced the state variables in the Eq. 7 with these solutions, which could then be solved, i.e. we found the solution of CNSC in steady state (Eq. 7 = 0). This solution was then computed into the remaining ODEs to find the solutions of the other state variables in steady state.

For the linear input, we found a single fixed point: [CNSC : 156.86,Cf : 137.25,Cw : 1568.63,Cr : 941.18]. This goes in line with what is expected for autonomous linear models with constant input, which according to Strogatz (1994) can only have one fixed point. In contrast, the intricate nonlinear interactions between the foliage and NSC compartments had consequences for the predicted fixed points. We found a couple of potential fixed points for the nonlinear model: [0,0,0,0,0] (empty system), and [CNSC : 286.70,Cf : 293.80,Cw : 2867.00,Cr : 1720.20]. The fixed point where all C stocks are 0 can only occur when the C input depends on a state variable, in this case the amount of carbon in the foliage (Cf). Thus, since these models are mass-balanced, if Cf reaches 0, no photosynthetic input can be generated and all compartments will decay until no C remains in them. Note also that since the steady-state solution for the Eq. 7 has a higher order than 1 (see supplement), the model may have more than these two fixed points, which means that there could be potential bifurcations or tipping points. For more information on this topic, please refer to (Strogatz 2014, p. 52).

Discussion

Dynamic vegetation models have been a powerful tool for enhancing our understanding of the dynamics of carbon allocation in vegetation, but their representation of this process often differs, and these differences impact the model performance (Walker et al. 2015). Considering that model-data synthesis projects have shown that model uncertainty can be reduced by identifying and evaluating the main assumptions causing differences among models (Medlyn et al. 2015), and that these assumptions are represented with different mathematical formulations, we performed a detailed analysis of the mathematical representation of the CA component of a representative sample of ecosystem models. This resulted in the identification of common structures from which we propose some generalizations. With this, we were able to obtain a deep insight into the properties of a system by identifying the factors on which photosynthesis and C partitioning and cycling depend on. This assessment motivated the formulation of the new mathematical framework proposed here, which emerges as a powerful tool to study carbon allocation in models.

This analysis brought to our attention that some carbon allocation models overlook properties that could be key to answer certain biological questions. For example, in most of the reviewed models, the photosynthetically fixed carbon is partitioned between compartments following predetermined coefficients or fractions (fixed partitioning scheme). Evidently, models with fixed coefficients for C partitioning and cycling would fail to simulate a vegetation that responds to environmental cues by adjusting its partitioning scheme: How can such a model be used to simulate the prioritization of a particular compartment when the environmental conditions change? Indeed, the compartments in models with fixed partitioning coefficients will always receive the same proportion of the photosynthetic input (independently from environmental cues), which means that these models may not be able to simulate observations such as the prioritization of the canopy of tropical forests a year after drought (Doughty et al. 2015). This would be a job for models with dynamic partitioning schemes, since they have functions in their β vector instead of coefficients, and these functions may consist of environmental variables (e.g. temperature, water availability) or state variables (i.e. amount of carbon in a given compartment at a given time) that regulate the amount of carbon partitioned to each compartment (Guillemot et al. 2015). However, it should be recognized that models with fixed coefficients are easier to implement and run, which is perhaps the reason why this strategy is often used in global vegetation models. In contrast, some models used for simulations in smaller scales (in more detail) often have dynamic partitioning of the photosynthetic input. Therefore, knowing the implications and advantages of each scheme can further help in the selection of model properties that can better represent the response mechanisms inferred from observations or boost the improvement of current models.

Another model simplification that could be problematic in certain contexts was the lack of carbon exchange between the vegetation compartments, which goes in hand with the absence of storage and labile compartments. Although this exclusion may have been rooted on the assumption that nonstructural or labile carbon is an overflow compartment, studies have discredited this assumption showing that its role is more important than that (Richardson et al. 2013; Dietze et al. 2014; Doughty et al. 2015; Hartmann et al. 2015; Martínez-Vilalta et al. 2016; Muhr et al. 2016). In fact, there is an interesting source-sink relationship between the vegetation C compartments and the labile C, and we speculate that this may have important implications in phenology (seasonal variability) in deciduous forests; the stored C can be cycled to the other compartments when the growing season starts, thus explaining the rapid loss and replenishment of carbon in foliage (Trumbore et al. 2015). Indeed, a study of climate sensitivity of NPP simulated by two dynamic global vegetation models concluded that these models could be improved with a more detailed representation of carbon reserves (Zhang et al. 2017). Although this dynamic has been represented in earlier individual-based models (Iwasa and Cohen 1989), this C exchange that results from phenology and storage was rarely observed within the structures of large-scale models. Thus, the phenology in such models might need to be represented with alternative strategies (e.g. hard coded rules) or might be left out completely, with the consequences that this may have for predicted carbon dynamics from season to season.

One of the characteristics that most of the reviewed models share is that they are linear; the carbon processes are expressed as the sum of the C fixed in the compartments and the one leaving the compartments—terms 1 and 2 of the first row in Eq. 1, respectively. This linear structure ignores possible nonlinear interactions among vegetation processes that can arise (e.g. when partitioning depends on state variables β(x,t)). Even though the linearity of some models is a simplification used to increase the speed of model runs without jeopardizing the trends at global scales, this characteristic could result in simulation errors when the aim is to model finer spatio-temporal scales (Merganičová et al. 2019). Most importantly, this simplification could be the reason why some models fail to capture the response of vegetation to extreme climatic changes; empirical findings suggest that climate change alone has contrasting effects on vegetation, and the effects of multiple driving factors on vegetation is not always additive (Leuzinger et al. 2011; Dieleman et al. 2012).

Categorizing models of interest in this framework has several advantages (as described above and in the following section), but a categorization performed solely based on the published model descriptions (PMDs) might not be advisable because they leave out important details. The PMDs of complex models such as the Dynamic Vegetation Models (DGVMs) often lack complete sets of equations, parameter values, or they do not describe the way in which the time steps are advanced when difference equations are used (e.g. LPJ by Sitch et al. (2003) and O-CN by Zaehle and Friend (2010)). Given that these details not only guarantee the reproducibility of the results, but the reliability of the model categorization within this framework, the source code of these models also needs to be assessed. As challenging as that process may be, the development of such models within this framework can be an efficient way to improve their performance and the one of the Earth system models to which they are coupled. Thus, we highly encourage other researchers to contribute with their own models to this database.

Individual-based models (e.g. aDGVM by Scheiter and Higgins (2009) and the single plant-based model proposed by Iwasa and Cohen (1989)) were not included in our review because interactions among many individuals are challenging to write analytically. However, we can think about these interactions as functions or maps that quantify the amount of carbon that is allocated to plant compartments or are transferred from one compartment to another. Solving individual-based models numerically and aggregating the results in plant compartments would allow one to write the matrices and vectors we propose. They would change over time due to all processes included in the original model, but we can recover these dynamics in an aggregated form. This strategy of capturing complex nonlinear interactions to reconstruct a compartmental system has been recently described in Metzler et al. (2018).

Implications and future studies

We have identified some of the impacts that certain model structures used to represent CA have on the qualitative behaviour of simulations, in agreement with other systematic studies (Holling 1973; Jacquez and Simon 1993; Strogatz 1994). This leads us to another of the main contributions of this framework, which is the ease to identify models that fall into the Compartmental Systems category. Compartmental systems are those whose compartments exchange material, and according to the qualitative theory proposed by Jacquez and Simon (1993), their structure may have important implications in the stability of the system. This was one of the main motivations for focusing this study in models where the vegetation is arbitrarily divided into compartments delimited by artificial boundaries, in spite of the fact that this delimitation does not exist in nature (Fox 1992b).

A stability analysis can inform us, for example, whether the modelled system could potentially reach a steady state, i.e. if each state variable in the model can converge to a fixed point in time, when its inputs and outputs are cancelled out. Note that systems with time-dependent variables such as f(x,t) = u(x,t) ⋅β(x,t) + B(x,t) ⋅x are classified by this theory as nonautonomous systems and do not converge to a steady state (see stability of nonautonomous systems in Jacquez and Simon (1993)). This is an interesting property because, even though it is highly debated whether the ecosystems can reach such a steady state, it is an assumption that has to be done in order to use some methods to diagnose the model performance (e.g. mean age of carbon) (Sierra et al. 2016).

Given that the structure of the models determines the dynamics that can be simulated (Holling 1973; Strogatz 1994), the modelled response of vegetation to changing climatic conditions depends on the variables and parameters used, as well as on their interactions, i.e. whether the systems are linear or not. Evidently, finding the right combination remains the most challenging part, because several aspects of the potential capability of vegetation to exploit climatic conditions remain unclear. Nonetheless, these and other interesting dynamics could be explored using models, and the software implemented in the database can help to test different hypothesis and parameter values to improve model performance. These improved models can then be used to guide field experiments (e.g. De Kauwe et al. 2017) by providing a priori predictions of the ecosystem response to treatments (Medlyn et al. 2016).

Perhaps one of the most important features of this framework is the division of the carbon allocation process into components of biogeochemical relevance (input: u, partitioning: β, and cycling: B); i.e. the complex conceptual design used in some models to represent the entire carbon allocation process in vegetation is arranged into smaller objects of interest that can be individually explored in detail. This is specially relevant for studies that aim at punctual aspects of the global carbon cycle as it is commonly done in empirical studies such as the study performed by Malhi et al. (2015), where they assessed the linkages between photosynthesis, productivity, growth, and biomass in lowland Amazonian forests using the largest data set assembled to that date. These empirical studies are crucial when trying to constrain terrestrial carbon cycle projections (Mystakidis et al. 2016), but it is important to know how to interpret the observation-based carbon flux estimates and incorporate them into models whose structure are a reflection of the observed carbon allocation strategies. Thus, this work could lead to a close dialog between empirical and theoretical research fields that could result in interesting findings.

Conclusions

Carbon allocation in vegetation is a complex process that involves C flowing into, between and out of plant compartments, at different rates. This concept can be generalized and formalized using the mathematical concept of compartmental dynamical systems, which is the backbone of the mathematical framework proposed here as means to provide an optimal common ground for the comparison of model structures. Essentially, with this matrix representation, important interactions within the elements of the model become conspicuous, e.g. nonlinear interactions among state variables and model dependencies on variables whose value can change with time. In this way, it facilitates the classification of models depending on their autonomy, i.e. nonautonomous models are driven by external variables that depend on time, whereas autonomous models do not, and according to their linearity, i.e. as linear and nonlinear models. Here, we found that many vegetation models can be represented in matrix form as a nonautonomous compartmental system of nonlinear ODEs. However, many models, particularly those used for large-scale simulations, use a linear representation with no interactions among ecosystem compartments.

We also identified some general properties of models according to the linearity and autonomy of the systems. We found that, although the linear structure is common, other structures with nonlinearities have been also proposed. These nonlinearities can be accommodated in the new mathematical framework proposed here. This framework has allowed us to assess models in a simple way and can be a powerful aid in the interaction between theoretical and empirical research in ecosystem ecology and other related fields. It is an interface that summarizes the models’ perspective of carbon allocation and highlights relations (dependencies) between factors that are normally studied separately and in detail in the field and the lab; so, hopefully both could find a common language to share their knowledge. This conceptual/mathematical framework may allow us to find new insights into the study of the global carbon cycle under changing climate and disturbance regimes, and identify gaps where more experiments and theoretical work is needed.