Principal process analysis of biological models
Abstract
Background
Understanding the dynamical behaviour of biological systems is challenged by their large number of components and interactions. While efforts have been made in this direction to reduce model complexity, they often prove insufficient to grasp which and when model processes play a crucial role. Answering these questions is fundamental to unravel the functioning of living organisms.
Results
We design a method for dealing with model complexity, based on the analysis of dynamical models by means of Principal Process Analysis. We apply the method to a wellknown model of circadian rhythms in mammals. The knowledge of the system trajectories allows us to decompose the system dynamics into processes that are active or inactive with respect to a certain threshold value. Process activities are graphically represented by Boolean and Dynamical Process Maps. We detect model processes that are always inactive, or inactive on some time interval. Eliminating these processes reduces the complex dynamics of the original model to the much simpler dynamics of the core processes, in a succession of submodels that are easier to analyse. We quantify by means of global relative errors the extent to which the simplified models reproduce the main features of the original system dynamics and apply global sensitivity analysis to test the influence of model parameters on the errors.
Conclusion
The results obtained prove the robustness of the method. The analysis of the submodel dynamics allows us to identify the source of circadian oscillations. We find that the negative feedback loop involving proteins PER, CRY, CLOCKBMAL1 is the main oscillator, in agreement with previous modelling and experimental studies. In conclusion, Principal Process Analysis is a simpletouse method, which constitutes an additional and useful tool for analysing the complex dynamical behaviour of biological systems.
Keywords
Dynamical systems Biological networks Process analysis Model reduction Parameter sensitivity analysis Circadian clockBackground
Mathematical modelling has been used for decades as an approach to understand the functioning of biological systems in terms of their internal processes and components. The latter form complex networks that vary in nature. For instance, biochemical networks include processes controlling the intracellular level of metabolites, RNAs and proteins, which allow cells to live and grow. A process either corresponds to a single biochemical reaction, for example protein phosphorylation, or encompasses many biochemical reactions like those involved in general cell functions (translation of proteins, transcription of RNAs...). In ecological networks, the processes can refer to events influencing the distribution and abundance of organisms, or to fluxes of energy and matter.
Numerous kinetic models of these networks have been developed in computational biology, of increasing complexity due to advances in modelling and parameter estimation approaches (see [1, 2] for an example). Complexity arises from the high dimension of the networks, the large number of biological processes involved and their non linearity due to the complex feedback loops that regulate them.
One approach often used to tackle the problem of complexity is model reduction (see [3] for a recent review). The simplified models are easier to analyse, while retaining the main features of the original ones and their biological significance. Briefly, methods of model reduction shorten the list of network species or of network reactions (e.g. [4, 5]), lump state variables (e.g. [6]) or decompose the system into slow and fast dynamics (e.g. [7, 8, 9]). The often used quasisteadystate approximation falls in the latter category (e.g. [10]). Other approaches simplify the mathematical functions describing the molecular processes. For instance, piecewise affine differential equations approximate by step functions the sigmoidal functions used to describe the regulation of gene expression. The dynamics of the simplified system can be easily analysed by means of state transition graphs [11]. However, these simplifications are generally restricted to models of gene expression and are more difficult to apply to other types of networks [12].
Reduction approaches have proven successful to significantly reduce model complexity, but they do not provide a mean to understand how the system dynamics emerges from the cascade of biological processes and regulatory mechanisms at work. This is especially true when the reduced models remain complex, with many coupled equations sharing common processes and involving complex feedback loops. For instance, regulatory mechanisms switch on certain biological processes at some times and off at others. It is thus important for a good understanding of the system behaviour to identify which and when processes significantly influence the system dynamics. In other words, instead of analysing a single reduced model in place of the original one, valid on the whole time interval, we may want to analyse series of simplified models highlighting the important processes of the original model during the periods of time in which they are active.
This is how we address the problem of high dimensional model analysis in this study. We develop a mathematical and numerical approach based on the boolean concept of activity/inactivity. The method, called Principal Process Analysis (PPA), determines the contribution of each biological process to the output of the dynamical system. In models of biological networks, these processes appear in a linear additive manner in each ODE. We first identify the inactive processes and neglect them. In a second step, we treat processes whose activity varies along time: we define time windows in which these processes are either always active or always inactive. We eventually create submodels for each time window that only contain the active processes. This procedure leads to the simplification of the system to its core mechanisms. The simplified system can be further studied, to understand the role of each active process in the system dynamics.
PPA is a general approach that can be easily applied to any biological system described by ordinary differential equations (ODEs). It shares common features with a model reduction method focusing on major model parameters rather than processes [4], in which parameters that are not required for the system behaviour are removed. Another approach dedicated to chemical reactions identifies and removes chemical species that contribute less to the model output [5]. In this case, the problem is solved using optimization approaches (see also [13]). Despite these similarities, PPA is not a model reduction approach. It provides a mean to access to and dissect the more complex dynamics of the original model through the analysis of simplified versions in given time windows. Results are easily interpretable and do not require additional and complicated computations.
Preliminary work on PPA has been described in an earlier conference paper [14], in which we applied PPA to two ODE models of biochemical networks whose simplification preserved their dynamical behaviour: the model of circadian rhythms in Drosophila [15] and the model of the regulation of the ERK signalling pathway [16]. Questions remained open though, concerning the scalability of the approach and its robustness: to which extent does PPA preserve model dynamics in systems of higher dimension, with many more biological processes involved and including interlocked feedback loops? And since the approach requires a priori knowledge of the parameter values, how sensitive are process activities or inactivities to the value of these parameters? In this study, we address these questions by studying a much more complex model of circadian rhythms in mammals, including 16 variables, 76 processes, and intertwined positive and negative feedback loops [17]. Parameter sensitivity analysis of the global relative error between the original and reduced systems allows us to assess the quality and robustness of our approach.
The paper is organized as follows. “Methods” section describes the principle of Principal Process Analysis as well as global sensitivity analysis. “Model description” section introduces the model of mammalian circadian clock. We apply our approach to this complex model in “Principal Process Analysis of the circadian clock model” to “Influence of parameter values” sections, before concluding in “Conclusions” section.
Methods
Principal Process Analysis (PPA)
where f_{ ij } represents the j^{ t h } process involved in the dynamical evolution of the i^{ t h } variable of the system over a period of time [ t_{0},T].
Example 1
where \(f_{14,1}=  V_{3B} \frac {B_{N}}{K_{p}+B_{N}},...,f_{14,7}=  k_{dn} B_{N} \).
where 0≤W_{ ij }(t,p)≤ 1 and \(\sum _{j}W_{ij}(t,p)=1\).
Definition 1
Let the continuous function f_{ ij }(x(t),p) be the j^{ t h } process of \(\dot {x}_{i}(t)\) in t ε[t_{0},T] and let the threshold δε [0,1]. We call a process f_{ ij }(x(t),p) always inactive when W_{ ij }(t,p)<δ∀ t ε [0,T]. We call a process f_{ ij }(x(t),p) inactive at time t when W_{ ij }(t,p)<δ. We call a process f_{ ij }(x(t),p) active at time t when W_{ ij }(t,p)≥δ. Switching time for a process f_{ ij }(x(t),p) is the time \(t_{ij}^{s}\) at which W_{ ij }(t,p)=δ. A process can have 0,1,…,z switching times. The switching time set S_{ i } for the i^{ t h } variable contains all the switching times \(t_{ij}^{s}\) where j=1,..,k and s=1,…,z. The global switching time set S is the union of all S_{ i }.
The choice of δ is important, since it determines above which weight a process can be considered active or inactive and, as we will see it later, if the process should be kept or omitted in the simplified model. An excessively high value might lead to an oversimplified model, without many dynamical features of the original model. Conversely a very low value might result in a model insufficiently simplified, which remains too complicated to analyse. From our experience, a convenient value is δε [0, 0.1], where the value of δ can be adjusted to the number of processes. For instance, if an ODE contains numerous processes of similar value, each individual process weighs little. In this case, δ should not be chosen too high to avoid omitting all these processes; it can be inversely proportional to the total number N of processes in the equation: \(\delta \propto \frac {1}{N}\). In this paper, finetuning the threshold value is not justified: there are not many processes per equation and they have very different values. We will always take δ=0.1.
Example 2

The weight of processes W_{14,2}, W_{14,6}, W_{14,7} is always below δ: their related processes f_{14,2}, f_{14,6}, f_{14,7} are thus always inactive;

The processes W_{14,1} and W_{14,3} are always above δ: f_{14,1} and f_{14,3} are active during the whole system dynamics;

The weight of processes W_{14,4} and W_{14,5} crosses the threshold twice and the switching times \(t_{14,4}^{1}=4.4\)h, \(t_{14,4}^{2}=20.7\)h, \(t_{14,5}^{1}=0.8\)h and \(t_{14,5}^{2}=20.3\)h are collected in the set S_{14}.
Visualization of process activities

The Boolean Process Map summarizes qualitatively the knowledge of the process activity or inactivity along time for each variable. A black bar means that the process is active, while the white bar indicates an inactive process.
 Example: The Boolean Process Map in Fig. 2a represents the process activities deduced from the dynamical weights in Fig. 1b. We observe that there is always an active phosphorylation of BMAL1 in the nucleus, while the basal degradation can be considered always inactive. The nuclear export is solely active in the first and last periods of time.

The Dynamical Process Map is a network representation of the process activities. Variables (represented by boxes) are connected by processes (arrows). Three cases arise, which depend on the activity of processes shared by several variables: blackcoloured arrows represent processes that are inactive for all variables involved, while active processes are displayed as red arrows. Yellow arrows are used for processes shared by several variables that have different activities: for instance, one process is considered active in one equation, but inactive in another one. Note that the model simplification by elimination of inactive processes, as will be described in “Model simplification by elimination of always inactive processes” section, will have for effect to remove black arrows in the Dynamical Process Map.

Example 3 Figure 2 b represents the Dynamical Process Map for x_{14}, the nuclear concentration of Bmal1, in the time interval between \(t^{1}_{14,4}\) and \(t^{1}_{14,5}\). Phosphorylation is an example of active process for the nuclear BMAL1 concentration (see the Boolean Process Map in Panel A). It is shown in red because it is also considered active at the same moment for the other variable sharing this process, the concentration of phosphorylated BMAL1.

The 3D Process Map represents the timedependent evolution of the intensity of each process. Process activities are averaged per hour, which leads to a discretisation of time. Vertical bars represent process weights for each hour. Their color code represents the intensity of process weights relatively to the other weights.

Example 4 Figure 2 c describes the 3D Process Map for the concentration of nuclear BMAL1. The phosphorylation of the protein, its nuclear import and its consumption for the formation of a large complex are the processes the most active over time.
Model simplification by elimination of always inactive processes
Eliminating processes that play a minor role in the system dynamics facilitates the analysis of large models. Since in the previous steps of PPA we have determined the process activities in system (2), we now neglect processes that are considered always inactive. This will give us g(x^{ r }), the function approximating f(x) in (2) with less processes.
where \(x^{r}=\left (x_{1}^{r},x_{2}^{r},\ldots,x_{n}^{r}\right) \epsilon \mathbb {R}^{n}\) is the vector of component concentrations, \(x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}\) the vector of their initial values, and \(p^{r} \epsilon \mathbb {R}^{c}\), where c≤b is the vector of parameters. The model simplification approach relies basically on the following theorem: if the vector fields of two systems are close (f(x)≈g(x)), then the solutions of the original and approximated systems are close during some time interval under the assumptions on the Lipschitz conditions listed in [18, p. 79, Th. 2.5].
Based on the dynamical weights determined in “Principal Process Analysis (PPA)” section and the threshold value δ, we apply the following rule to define g(x^{ r },p^{ r }): if W_{ ij }(x(t),p) < δ∀ t ε [ t_{0},T]theng_{ ij } = 0; if not, g_{ ij }≡f_{ ij }.
We thus define x^{ r } as an approximation of x and p^{ r } as a subset of p.
Example 5
Note that Principal Process Analysis is applied to each ODE separately. As a consequence, processes shared by two equations can be active in one equation, but inactive in the other. Elimination of the inactive processes breaks mass balance in the simplified model. For our purpose, this is not an issue: the simplification does not aim at reducing the model, but rather analysing a submodel of the original one, which describes the dynamics of the important phenomena.
It is interesting to quantify the extent to which the simplified system (6) preserves the behaviour of the original one. This gives a better sense of how the active processes kept in the simplified model are responsible for the dynamics of the original system. In addition, this helps identifying potential problems related to the model simplification, for instance involving a wrong choice of the δ value. One can also imagine pathological cases, when the simplified system does not reproduce the main dynamical features of the original model: for instance, if it evolves towards a different basin of attraction or if the removal of a consumption term does not compensate a synthesis term any more, leading the simplified system to explode in finite time. It is non nonsensical in all these cases to analyse simplified models that behave so differently from the original ones. The δ threshold should be adjusted to a new value and Principle Process Analysis rerun until model simplification proves satisfactory according to the criteria described below.
How to choose the model outputs? They can correspond to all model variables or combinations of them, if the latter are involved in some biological phenomena of interest for instance. In the case of the circadian clock model, six variables were specifically studied in the original papers [17, 19], which we will use as outputs to determine the global relative error between the original and simplified models: the concentrations of Per mRNA (M_{ P }), Cry mRNA (M_{ C }), Bmal1 mRNA (M_{ B }), total PER protein (P_{ Tot }), total CRY protein (C_{ Tot }) and total BMAL1 protein (B_{ Tot })^{1}.
Creation of sequences of submodels
In the previous step of PPA, the models are simplified by elimination of always inactive processes. Here we go one step further in the simplification, by eliminating processes that are inactive at times. This is achieved by decomposing the period of time during which the system evolves into time intervals. To that end we use the switching times t_{ b } (with b=1,…,d) determined in “Principal Process Analysis (PPA)” section: this allows creating a succession of submodels for each time interval, which contain the core mechanisms in that period of time.
where μ_{ v } is the mean of the switching times in C_{ v }. The consequence is that processes with switching times belonging to cluster C_{ v } are assumed to switch together at the same time \(t^{r}_{v}=\mu _{v}\), the mean switching time in cluster C_{ v }.
where \(n^{v}_{act}\) denotes the number of active processes in the v^{ t h } time window.
We eventually end up with a sequence of z+1 submodels in the time interval [ 0,T], the first one being valid in \(\left [0,t^{r}_{1}\right ]\) and the last one, in \(\left [t^{r}_{z},T\right ]\).
We compute the error (11) between the original model and each submodel, with or without propagating errors: in the first case, for each time window \(\left [t^{r}_{v1},t^{r}_{v}\right ]\) (v=1,…,z+1 with \(t^{r}_{0}=t_{0}\) and \(t^{r}_{z+1}=T\)), the initial values of the h outputs of submodel SM_{ v } are equal to the final values at \(t^{r}_{v1}\) of submodel SM_{v−1}; in the second case, they are equal to the values of the original model at \(t^{r}_{v1}\).
Global sensitivity analysis
Principal Process Analysis is applied to models with given parameter values and initial conditions. It may be questioned whether the uncertainty of their values influences the simplification of the model and thus, the analysis of the system dynamics. While we have shown PPA to be robust to variations of initial conditions in [21], the question remains open for parameter values.
To that aim, we perform global sensitivity analyses on the global relative errors between the original model and the reduced model (defined in Eq. (11)). Such an analysis consists in quantifying the parameter influence on the error, while varying the parameters simultaneously in given ranges. In contrast, in a local sensitivity analysis, parameters would vary oneatatime in the neighbourhood of their nominal value. First, we perform an analysis on each of the six errors defined for the six model outputs \( \left (e^{v}_{M_{P}},e^{v}_{M_{C}},e^{v}_{M_{B}},e_{P_{Tot}},e^{v}_{C_{Tot}},e^{v}_{B_{Tot}}\right)\). Then, in a more detailed analysis, we compute the global relative error for each state variable, according to Eq.(11) (with y_{ h }=x_{ i },i=1,…,16); sensitivity analyses are also performed on each of these 16 errors. The method used is based on factorial design [22], analysis of variance (ANOVA) and principal component analysis (PCA) [23].
where \(e^{v}_{h,j}\) is the error computed according to Eq. (8) for output (or state variable) h, time window v, and parameter combination j (j=1,…,N_{ j }) of the fractional factorial design; \(\mu ^{v}_{h}\) is the grand mean; \(\alpha ^{v}_{h,f(j)}\) is the main effect of parameter p_{ f } for parameter combination j; \(\beta ^{v}_{h,f(j)k(j)}\) is the interaction effect between parameters p_{ f } and p_{ k } (k≠f) for parameter combination j; and \(\epsilon ^{v}_{h,j}\) is the residual. Each main effect \(\alpha ^{v}_{h,f(j)}\) can take two values, according to the level of parameter p_{ f } in combination j: \(\alpha ^{v}_{h,f^{+}}\) or \(\alpha ^{v}_{h,f^{}}\). Similarly, each twoway interaction effect can take four values: \(\beta ^{v}_{h,f^{+}k^{+}},\: \beta ^{v}_{h,f^{+}k^{}},\: \beta ^{v}_{h,f^{}k^{+}},\: \beta ^{v}_{h,f^{}k^{}}\). The fractional factorial design determines the parameter combinations needed to estimate all main effects and twoway interactions. It is obtained using R package planor^{2} and consists of N_{ j }=2^{12} parameter combinations, yielding as many simulations. According to the sparsityofeffects principle, a system is usually dominated by main effects and low order interactions, so neglecting thirdorder and higher interactions can still provide good estimates.
We use the multisensi R package^{3} for this analysis.
In what follows, we show how Principal Process Analysis can help with the analysis of complex biological models. We apply the approach to a model of the circadian clock developed in [17, 19], which we describe in the following section.
Results
Model description
 Transcription of genes Per, Cry and Bmal1 occurs in the nucleus. The newly synthesized mRNAs are exported to the cytosol.

In the cytosol, the mRNAs can be either degraded or translated into proteins, which ones are subsequently phosphorylated (the process is reversible). Unphosphorylated proteins PER and CRY form the complex PERCRY, which reversibly enters the nucleus. The nuclear and cytosolic forms of the complex can be phosphorylated. Likewise, protein BMAL1 is reversibly phosphorylated and reversibly enters the nucleus, but sole its unphosphorylated form makes a complex with protein CLOCK. Phosphorylated proteins and complexes in the nucleus or the cytosol are subject to degradation.

In the nucleus, the complex CLOCKBMAL1 activates the transcription of Per and Cry genes. Activation is stopped by binding of the PERCRY complex to CLOCKBMAL1, which indirectly inhibits Per and Cry transcription.

The concentration of CLOCK protein is not a variable in the model because it is constitutively expressed at high levels and considered to be not limiting [17].
The 16 model equations, 56 parameter and 16 initial condition values are shown in Appendix B. The model dynamics is difficult to analyse though, as the circadian clock involves numerous processes, including interlocked positive and negative feedback loops responsible for the oscillatory behaviour of the clock proteins. Reducing the original model around its core active processes can facilitate the model analysis, without changing significantly the original dynamics, in particular the sustained oscillations of the solutions.
Principal Process Analysis of the circadian clock model
We apply Principal Process Analysis to identify major processes of the circadian clock model. To this end we decompose each ordinary differential equation in processes, as shown in Eq. (4) for BMAL1. Each process has a biological interpretation and corresponds to a regulatory mechanism or a biochemical reaction.
We then calculate the relative weight of each process using Eq. (5) and the threshold value δ=0.1.
Global relative errors between the original and reduced models for the six outputs
Global relative error  

Output  M _{ P }  M _{ C }  M _{ B }  P _{ Tot }  C _{ Tot }  B _{ Tot } 
Error  0.2499  0.2148  0.1535  0.2648  0.1326  0.2053 
Creation of submodels

SM1, valid from \(t_{0}^{r}=0\) to \(t_{1}^{r}=0.9\) h: neglected processes for this model are always inactive (32% of the total). This model corresponds to the simplified model obtained in “Principal Process Analysis of the circadian clock model” section.

SM2, from \(t_{1}^{r}=0.9\) h to \(t_{2}^{r}=6\) h: 46% of the processes are neglected. In addition to the always inactive listed in “Principal Process Analysis of the circadian clock model” section, we have the following inactive processes in this model: cytosolic dephosphorylation of PER, CRY, and PERCRY; cytosolic dissociation of PERCRY; nuclear dephosphorylation of PERCRY; PERCRY export from the nucleus; and formation of the large complex PERCRYCLOCKBMAL1.

SM3, from \(t_{2}^{r}=6\) h to \(t_{3}^{r}=12.5\) h, in which 50% of processes are neglected. In addition to the processes listed in “Principal Process Analysis of the circadian clock model” section, inactive processes are in this case: transcription of Per and Cry mRNAs; cytosolic phosphorylations and dephosphorylations of PER and CRY; cytosolic dephosphorylation of PERCRY; nuclear phosphorylation and dephosphorylation of PERCRY; and nuclear export of BMAL1.

SM4, from \(t_{3}^{r}=12.5\) h to \(t_{4}^{r}=20\) h, which neglects 42% of processes. The processes include the processes listed in “Principal Process Analysis of the circadian clock model” section, as well as: PER and CRY translation; formation of the PERCRY complex in the cytosol; PERCRY dephosphorylation in the cytosol and the nucleus; and export of BMAL1 from the nucleus.

SM5, from \(t_{4}^{r}=20\) h to \(t_{5}^{r}= 24\) h, in which 46% of the processes are neglected. With the processes listed in “Principal Process Analysis of the circadian clock model” section, other neglected processes are: cytosolic dephosphorylation of PER and CRY; PERCRY dissociation in the cytosol; export of PERCRY; PERCRY dephosphorylation both in the cytosol and the nucleus; and PERCRYCLOCKBMAL1 formation.
See also Appendix D for the list of neglected processes in each submodel.
Global relative error between the original model and each submodel without propagation error for the six outputs
Global relative error  

Output  M _{ P }  M _{ C }  M _{ B }  P _{ Tot }  C _{ Tot }  B _{ Tot } 
Error SM1  0.0044  0.0044  0.0044  0.0208  0.0195  0.0073 
Error SM2  0.0519  0.0434  0.0453  0.0397  0.1832  0.0402 
Error SM3  0.2059  0.2951  0.0360  0.1427  0.2233  0.0356 
Error SM4  0.0143  0.0377  0.0389  0.0678  0.1164  0.0210 
Error SM5  0.0146  0.0032  0.0230  0.1150  0.0237  0.0053 
Since the dynamics of the coupled submodels remain close to the original one, we can further analyse the behaviour of the network simplified to its core processes. We use the Dynamical Process Maps for the different submodels (Appendix E), together with the process activities in Fig. 4 and the model outputs in Fig. 7. The simplified models preserve the three main interlocked feedback loops described in the original model, one positive and two negative loops. The functioning of these loops is directly affected by changes of process activities. Among the two negative feedback loops, which one is the main oscillator? One negative feedback loop involves the inhibition of Bmal1 transcription by the nuclear form of BMAL1 associated to the protein CLOCK. If this mechanism is the main source of oscillations, we should observe wide changes in process activities controlling BMAL1 levels. The total concentration of the protein does not vary much in amplitude (Fig. 7). It mainly decreases in SM2 and SM3, when the concentration of PERCRY is also high and forms a complex with CLOCKBMAL1, which is subsequently degraded. This degradation process is active most of the time (Fig. 4 and Appendix E), but variations of the total BMAL1 concentration do not modify strongly the transcription of Bmal1 mRNA, which remains always active. As well, the other processes of translation, phosphorylation and degradation for this variable almost never switch between inactive and active states over time (Fig. 4 and Appendix E). Overall, this suggests that the negative feedback loop involving CLOCKBMAL1 is not the main oscillator. A similar conclusion was drawn for the original model in [17].
The other negative feedback loop inhibits Per and Cry transcription through the titration of CLOCKBMAL1 by PERCRY to form the inhibitory complex PERCRYCLOCKBMAL1. The total concentration of BMAL1 peaks before that of PER and CRY, as can be seen in Fig. 7 for SM2 and SM3. When its concentration is maximal in SM1 and SM2, the nuclear form of the protein associated to the protein CLOCK stimulates the transcription of Per and Cry genes, in conditions where light has also a stimulatory effect on the transcription of these two genes. The processes of transcription and translation of Per and Cry are active in both models, as a result of which levels of PER and CRY raise to reach their maximal concentration in SM3. As can be seen from the process activities in Fig. 4 and the Dynamical Process Maps in Appendix E, conditions are favourable for the accumulation of high levels of complexes PERCRY and CLOCKBMAL1PERCRY in the nucleus. For instance, numerous processes decreasing PER, CRY and PERCRY concentrations in the cytosol and the nucleus are inactive: their phosphorylation is reduced (the process is inactive for the dephosphorylated forms but still active for the phosphorylated ones), which limits their degradation, and the nuclear import of PERCRY is always active. During the same period of time, the formation of the large complex CLOCKBMAL1PERCRY, which is active for both CLOCKBMAL1 and PERCRY (Fig. 4 and Appendix E), suggests that the nuclear forms of PERCRY and CLOCKBMAL1 bind as soon as they accumulate in the nucleus. The large complex is immediately degraded since its degradation process is always active and its dissociation, always inactive.
In SM2 and SM3, the degradation of the large complex is not compensated for by other mechanisms allowing BMAL1 accumulation in the nucleus: the cytosolic form of the protein is actively phosphorylated and then degraded, while its dephosphorylation is inactive, which reduces the quantity of protein to be imported in the nucleus (see Fig. 4 and the Dynamical Process Maps in Appendix E). In this compartment, the absence of active dephosphorylation, together with the active protein phosphorylation, also contribute to decrease pools of CLOCKBMAL1 complexes (Fig. 4, Appendix E). This halts transcription of Per and Cry mRNAs in SM3 (the processes are inactive and light is also switched off towards the end of SM3). This also affects the translation of PER and CRY, which becomes inactive in SM4. Altogether these observations suggest that the negative feedback loop inhibiting Per and Cry transcription via the complex CLOCKBMAL1PERCRY is the main source of circadian oscillations. This is consistent with conclusions in [17], where a second oscillator based on the autoinhibition of BMAL1 has been obtained for specific parameter values only. These results are also consistent with the observation of arrhythmic behaviours in mutant mice with double knockout of the Per and Cry genes [25, 26].
The positive feedback loop activates Per and Cry transcription through a control of protein stability mediated by the phosphorylation processes. In the model, sole the phosphorylated forms of the proteins are degraded. We observed that the reversible phosphorylation reactions are often displaced in the forward sense, as dephosphorylation processes are often found inactive. In particular, they contribute to decrease the concentration of PER, CRY and PERCRY, which also diminishes the concentration of the large complex CLOCKBMAL1PERCRY and thus relieves the inhibition exerted by the complex on transcription of Per and Cry genes. Kinetic modelling of the circadian clock in Drosophila has shown the importance of this positive feedback loop for circadian rhythms [27].
Influence of parameter values
Percentage of tGSI for parameters contained in inactive processes
% tGSIinactive  

SM  SM1  SM2  SM3  SM4  SM5 
R_{ h }(%)  19.11  15.55  59.54  0  0 
Discussion
A challenging task when analysing the dynamics of biological networks is to understand the relation between the network behaviour and its numerous processes, the activity of which is switched on and off by regulatory mechanisms. Model reduction is one possible approach to deal with model complexity and help deciphering the design principle of these networks. However, the reduced models can be still too complex to study and this does not answer the question on the role of each individual process in the network behaviour. Ideally, one would like to identify the major processes, quantify and then understand their contribution to the system dynamics. Principal Process Analysis was developed with this objective in mind, and with the final goal of simplifying the original model in one or several submodels around core active processes that are responsible for the dynamics of the original system. The dynamics of the core processes is much more tractable in the submodels than in the original one. Questions remained open though concerning the scalability and robustness of this approach.
In this paper we tested the scalability of Principal Process Analysis by applying the approach on a model of high dimension, the mammalian circadian clock model, which incorporates numerous processes and complex interlocked feedback loops responsible for oscillatory behaviours. Simplification of the original system dynamics to as much as 50% of its processes in five coupled submodels helped us relate the dynamics of the simplified models to the system components and their active interactions. We hence observed that the negative feedback loop controlling Per and Cry transcription through the formation of the large complex PERCRYCLOCKBMAL1 is the main oscillator, in agreement with previous experimental and modelling studies [17, 25, 26]. Principal Process Analysis has been also applied with success to diverse biogeochemical and biochemical models in our group and elsewhere, see [14, 21, 28, 29]. These case studies and the present one exemplify the applicability and scalability of the approach to models of diverse nature and and complexity.
In this paper, the quantification of global errors allowed us to conclude that the simplified models reproduce well the behaviour of the original ones. Even in the case of the largest errors observed on the model output, did the simplified models preserve the oscillations of the clock proteins. Since Principal Process Analysis is based on the a priori knowledge of the model parameters, it was important to assess the robustness of the approach to uncertainties on these parameter values. Through a global sensitivity analysis, we studied the impact of of parameter values on the error between the original model and the simplified submodels. Not only was the variation of the error small, but it was mostly due to parameters of the neglected processes. With this analysis, we proved the robustness of PPA to parameter uncertainty. In addition, we provided clues to identify and solve potential troubles related to the model simplification, in order to decrease errors between the original model and the simplified submodels.
In a recent study, we also showed by other methods the robustness of PPA to initial conditions [21]. The latter were supposed to lie in rectangles contained in a region of the variable space varying by one order of magnitude in each coordinate. Under additional assumptions on the monotonicity of biological processes within rectangles, the maximal bound of process weights was computed, which allowed identifying active processes in each rectangle, similarly to “Principal Process Analysis (PPA)” section, for which weight is above the threshold value δ. Based on the behaviour of processes on the edges of the rectangles, it was then possible to determine the transitions between rectangles and deduce the evolution of process activities along the different transitions. The method has been applied on a small gene expression network containing a negative feedback loop [21]. The same principles could be applied to show the robustness of the model to larger variations of its parameters. In this case, the parameter space should be divided in rectangles in which the activity/inactivity of processes is studied. Such extension of the method is part of a future work.
In the current state of development, Principal Process Analysis is not a model reduction approach. For instance, the elimination of the inactive processes from the original model breaks down the mass conservation relations when eliminating a process in one equation that is considered active in others. As long as the purpose of PPA is to analyse the important processes in the original model, this is not an issue. Nevertheless, the approach could be extended so as to preserve mass conservation relations. In addition, simplified models with much smaller global relative errors could be obtained, so that the simplified submodels represent more accurately the original model. We are currently studying a refinement of PPA by considering three different levels of activities (inactive, active, fully active), defined by two different thresholds in order to improve the quality of the model simplification and model analysis. Such improvements could bring PPA closer to a reduction method, since the simplified models become accurate representations of the original model.
Conclusions
Mathematical models of biological systems have grown in complexity to include large numbers of processes. As a consequence, their contribution to the system dynamics becomes hardly tractable. The current manuscript contributes to this problem with the development of Principal Process Analysis. Provided the ODE model of the system is composed of a linear sum of terms describing each a process, the method enables the identification of the major processes contributing to the system dynamics and when they play a key role. Removing inactive processes allows restricting the dynamical analysis of the system to its core processes and facilitates the understanding of the system functioning. The conclusions derived with the method are robust to fluctuations in the parameter values. As such, Principal Process Analysis can be applied to any type of ODE models with the same form.
Appendix A: Estimate of errors
We give in this appendix a rough estimate of the a priori error, based on bounds in the model and simple lemmas to compare the solutions of two differential equations. We refer to [18, Chapter 3], for the basic notions.
where f_{ ij } represents the j^{ t h } process involved in the dynamical evolution of the i^{ t h } variable of the system over a period of time [ t_{0},T], and n_{ i } is the number of processes for \(\dot {x}_{i}\).
where B_{ i } is an upper bound for \(\sum _{j=1}^{n_{i}}f_{ij}(x(t),p)\) obtained from the variable bounds on domain D. All B_{ i } form vector B.
If the initial conditions are the same (x(t_{0})=y(t_{0})), then Theorem 3.4 in [18], which is based on Gronwall’s Lemma, gives a bound between the two solutions x and y:
The same proof applies when several f_{ ij } are inactive for some variables. One just need to sum the errors in Eq. (22).
This gives a rough bound between the two solutions; this bound is theoretical and conservative, and is not used in the practical a posteriori computation of the error in our work. Nevertheless, it shows that the error is roughly proportional to the threshold δ used in the weight computations.
Appendix B: Full mammalian model
Model equations
Model parameters
Parameters listed in [17, p.546]: Set 1.
k_{1}(h^{−1})=0.4, k_{2}(h^{−1})=0.2, k_{3}(nM^{−1}h^{−1})=0.4, k_{4}(h^{−1})=0.2, k_{5}(h^{−1})=0.4, k_{6}(h^{−1})=0.2, k_{7}(nM^{−1}h^{−1})=0.5, k_{8}(h^{−1})=0.1, K_{ AP }(nM)=0.7, K_{ AC }(nM)=0.6, K_{ IB }(nM)=2.2, k_{ dmb }(h^{−1})=0.01, k_{ dmc }(h^{−1})=0.01, k_{ dmp }(h^{−1})=0.01, k_{ dnc }(h^{−1})=0.12, k_{ dn }(h^{−1})=0.01, K_{ d }(nM)=0.3, K_{ dp }(nM)=0.1, K_{ p }(nM)=0.1, K_{ mB }(nM)=0.4, K_{ mC }(nM)=0.4, K_{ mP }(nM)=0.31, k_{ stot }(h^{−1})=1.0, k_{ sB }(h^{−1})=0.12k_{ stot }, k_{ sC }(h^{−1})=1.6k_{ stot }, k_{ sP }(h^{−1})=0.6k_{ stot }, n=4, m=2, V_{ phos }(nMh^{−1})=0.4, V_{1B}(nMh^{−1})=0.5, V_{1C}(nMh^{−1}) = 0.6, V_{1P}(nMh^{−1}) = V_{ phos },V_{1PC}(nMh^{−1})=V_{ phos }, V_{2B}(nMh^{−1})=0.1, V_{2C}(nMh^{−1})=0.1, V_{2P}(nMh^{−1})=0.3, V_{2PC}(nMh^{−1})=0.1, V_{3B}(nMh^{−1})=0.5, V_{3PC}(nMh^{−1})=V_{ phos }, V_{4B}(nMh^{−1})=0.2, V_{4PC}(nMh^{−1})=0.1, v_{ dBC }(nMh^{−1})=0.5, v_{ dBN }(nMh^{−1})=0.6v_{ dCC }(nMh^{−1})=0.7, v_{ dIN }(nMh^{−1})=0.8, v_{ dIN }(nMh^{−1})=0.8, v_{ dPC }(nMh^{−1})=0.7, v_{ dPCC }(nMh^{−1}) = 0.7,v_{ dPCN }(nMh^{−1}) = 0.7,v_{ mB }(nMh^{−1}) =0.8, v_{ mC }(nMh^{−1})=1.0, v_{ mP }(nMh^{−1})=1.1, vstot(nMh^{−1})=1.0, v_{ sB }(nMh^{−1})=v_{ stot }, v_{ sB }(nMh^{−1})=v_{ stot }, v_{ sC }(nMh^{−1})=1.1v_{ stot }, v_{ sP }(nMh^{−1})=1.5v_{ stot }
Switching times (s.t), their values (v.) in [h] and associate reduced (cluster) switching times\(\left (t^{r}_{1},t^{r}_{2},t^{r}_{3},t^{r}_{4}\right)\) (s.t.c.): \(t_{1}^{r}\) is associated to the cluster of \(t_{1} t_{6}, t^{r}_{2}\) to \(t_{7}  t_{15}, t^{r}_{3}\) to t_{16}−t_{26}, and \(t^{r}_{4}\) to t_{27}−t_{46}
s.t.  v.  s.t.c.  s.t.  v.  s.t.c.  s.t.  v.  s.t.c.  s.t.  v.  s.t.c. 

t _{0}  0  t _{12}  5.9  6  t _{24}  13.5  t _{36}  19.5  20  
t _{1}  0.3  t _{13}  7.9  t _{25}  13.6  t _{37}  20.3  
t _{2}  0.6  t _{14}  8.2  t _{26}  15.6  t _{38}  20.4  
t _{3}  0.8  0.9  t _{15}  8.6  t _{27}  17.3  t _{39}  20.45  
t _{4}  1  t _{16}  9.8  t _{28}  17.4  t _{40}  20.5  
t _{5}  1.1  t _{17}  10.4  t _{29}  18.5  t _{41}  20.7  
t _{6}  1.5  t _{18}  11.2  t _{30}  18.9  t _{42}  20.8  
t _{7}  3.8  t _{19}  11.5  t _{31}  19.1  t _{43}  21.5  
t _{8}  4.1  t _{20}  12.4  12.5  t _{32}  19.2  t _{44}  21.6  
t _{9}  4.4  t _{21}  12.6  t _{33}  19.25  t _{45}  22.3  
t _{10}  5.4  t _{22}  13.3  t _{34}  19.3  t _{46}  22.9  
t _{11}  5.7  t _{23}  13.4  t _{35}  19.35  T  24 
Initial conditions
The unit of the initial conditions is nM.
M_{ P }(0)=2.188M_{ C }(0)=1.633,M_{ B }(0)=9.498,P_{ C }(0)=2.008,C_{ C }(0)=1.884,P_{ CP }(0)=0.129,C_{ CP }(0)=0.473, PC_{ C }(0)=1.228,PC_{ N }(0)=0.177,PC_{ CP }(0)=0.203, PC_{ NP }(0)= 0.101,B_{ C }(0) =2.523,B_{ CP }(0)=0.929,B_{ N }(0)=1.787,B_{ NP }(0)=0.318,I_{ N }(0)=0.051
Appendix C: Switching times
See Table 4.
Appendix D: Neglected processes
First reduced model
Neglected processes are: f_{1,3}, f_{2,3}, f_{3,3}, f_{4,6}, f_{5,3}, f_{5,6}, f_{6,4}, f_{7,4}, f_{8,2}, f_{8,7}, f_{9,6}, f_{9,7}, f_{10,4}, f_{11,4}, f_{12,3}, f_{12,6}, f_{13,2}, f_{13,4}, f_{14,2}, f_{14,6}, f_{14,7}, f_{15,4}, f_{16,1}, f_{16,4}.
Second reduced model: submodels
Neglected processes in SM1 are: f_{1,3}, f_{2,3}, f_{3,3}, f_{4,6}, f_{5,3}, f_{5,6}, f_{6,4}, f_{7,4}, f_{8,2}, f_{8,7}, f_{9,6}, f_{9,7}, f_{10,4}, f_{11,4}, f_{12,3}, f_{12,6}, f_{13,2}, f_{13,4}, f_{14,2}, f_{14,6}, f_{14,7}, f_{15,4}, f_{16,1}, f_{16,4}.
In SM2, we supposed that processes switching state from t_{1}=0.33 until t_{6}=1.5 change simultaneously at time \(t_{1}^{r}=0.9\). Deleted processes are common to those removed in SM1, as well as: f_{4,3}, f_{4,4}, f_{5,4}, f_{7,2}, f_{8,3},f_{8,5}, f_{9,2}, f_{9,3}, f_{10,2}, f_{14,5}.
In SM3, we supposed that processes switching state from t_{7}=3.8 until t_{6}=1.5 change simultaneously at time \(t_{2}^{r}=6\). Deleted processes are common to those removed in SM1, as well as: f_{1,1}, f_{2,1}, f_{4,2}, f_{4,3}, f_{5,2}, f_{7,2}, f_{8,1}, f_{9,1}, f_{9,2}, f_{10,2}, f_{11,2}, f_{12,5}, f_{14,4}.
In SM4, we supposed that processes switching state from t_{16}=9.8 until t_{26}=15.6 change simultaneously at time \(t_{3}^{r}=12.5\). Deleted processes are common to those removed in SM1, as well as: f_{4,1}, f_{5,1}, f_{8,4}, f_{9,2}, f_{10,2}, f_{11,2},f_{12,5}, f_{14,4}.
In SM5, we supposed that processes switching state from t_{27}=17.3 until t_{46}=22.9 change simultaneously at time \(t_{4}^{r}=20.0\). Deleted processes are common to those removed in SM1, as well as: f_{4,3}, f_{4,4}, f_{5,4}, f_{7,2}, f_{8,3}, f_{8,5}, f_{9,2}, f_{9,3}, f_{10,2}, f_{14,5}.
Appendix E: Dynamical Process Maps
Footnotes
 1.
Total protein concentrations are defined as follows: P_{ Tot }=P_{ C }+P_{ CP }+PC_{ C }+PC_{ N }+PC_{ CP }+PC_{ NP }+I_{ N }, C_{ Tot }=C_{ C }+C_{ CP }+PC_{ C }+PC_{ N }+PC_{ CP }+PC_{ NP }+I_{ N }, B_{ Tot }=B_{ C }+B_{ CP }+B_{ N }+B_{ NP }+I_{ N }.
 2.
planor: Generation of regular factorial designs https://CRAN.Rproject.org/package=planor
 3.
multisensi: Multivariate Sensitivity Analysis https://CRAN.Rproject.org/package=multisensi
Notes
Acknowledgements
The authors would thank the research program LABEX SIGNALIFE (ANR11LABX002801) and the anonymous reviewers for helpful comments.
Funding
We acknowledge the Conseil Régional PACA and the Investissements d’Avenir Bioinformatique programme under project RESET (ANR11BINF0005) for funding the PhD thesis of S. Casagranda.
Availability of data and materials
All data generated or analysed during this study are included in the published article and the Appendix. Matlab scripts to run Principal Process Analysis on the circadian clock model are available at the following link: http://wwwsop.inria.fr/members/JeanLuc.Gouze/BMBcode/code.zip.
Authors’ contributions
SC, DR and JLG designed the study and developed the methodology. SC performed the analysis. ST contributed to parameter sensitivity analysis. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.Bettenbrock K, Fischer S, Kremling A, Jahreis K, Sauter T, Gilles ED. A quantitative approach to catabolite repression in Escherichia coli. J Biol Chem. 2006; 281(5):2578–84.CrossRefPubMedGoogle Scholar
 2.Kuepfer L, Peter M, Sauer U, Stelling J. Ensemble modeling for analysis of cell signaling dynamics. Nat Biotechnol. 2007; 25(9):1001–6.CrossRefPubMedGoogle Scholar
 3.Snowden TJ, van der Graaf PH, Tindall MJ. Methods of model reduction for largescale biological systems: a survey of current methods and trends. Bull Math Biol. 2017; 79(7):1449–86.CrossRefPubMedPubMedCentralGoogle Scholar
 4.Apri M, de Gee M, Molenaar J. Complexity reduction preserving dynamical behavior of biochemical networks. J Theor Biol. 2012; 304:16–26.CrossRefPubMedGoogle Scholar
 5.Petzold L, Zhu W. Model reduction for chemical kinetics: An optimization approach. AIChE J. 1999; 45(4):869–86.CrossRefGoogle Scholar
 6.Sunnåker M, Cedersund G, Jirstrand M. A method for zooming of nonlinear models of biochemical systems. BMC Syst Biol. 2011; 5(1):140.CrossRefPubMedPubMedCentralGoogle Scholar
 7.Gorban AN, Karlin IV. Method of invariant manifold for chemical kinetics. Chem Eng Sci. 2003; 58(21):4751–68.CrossRefGoogle Scholar
 8.Anderson J, Chang YC, Papachristodoulou A. Model decomposition and reduction tools for largescale networks in systems biology. Automatica. 2011; 47(6):1165–74.CrossRefGoogle Scholar
 9.Hangos KM, Gábor A, Szederkényi G. Model reduction in biochemical reaction networks with MichaelisMenten kinetics. In: Control Conference (ECC), 2013 European. Zürich: IEEE: 2013. p. 4478–4483.Google Scholar
 10.Segel LA, Slemrod M. The quasisteadystate assumption: a case study in perturbation. SIAM Rev. 1989; 31(3):446–77.CrossRefGoogle Scholar
 11.de Jong H, Gouzé JL, Hernandez C, Page M, Sari T, Geiselmann J. Qualitative simulation of genetic regulatory networks using piecewiselinear models. Bull Math Biol. 2004; 66(2):301–40.CrossRefPubMedGoogle Scholar
 12.Baldazzi V, Ropers D, Markowicz Y, Kahn D, Geiselmann J, de Jong H. The carbon assimilation network in Escherichia coli is densely connected and largely signdetermined by directions of metabolic fluxes. PLoS Comput Biol. 2010; 6(6):1000812.CrossRefGoogle Scholar
 13.Bhattacharjee B, Schwer DA, Barton PI, Green WH. Optimallyreduced kinetic models: reaction elimination in largescale kinetic mechanisms. Combust Flame. 2003; 135(3):191–208.CrossRefGoogle Scholar
 14.Casagranda S, Ropers D, Gouzé JL. Model reduction and process analysis of biological models. In: 2015 23rd Mediterranean Conference on Control and Automation (MED). Torremolinos: IEEE: 2015. p. 1132–9.Google Scholar
 15.Leloup JC, Goldbeter A. A model for circadian rhythms in Drosophila incorporating the formation of a complex between the PER and TIM proteins. J Biol Rhythm. 1998; 13(1):70–87.CrossRefGoogle Scholar
 16.KwangHyun C, SungYoung S, HyunWoo K, Wolkenhauer O, McFerran B, Kolch W. Mathematical modeling of the influence of RKIP on the ERK signaling pathway In: Priami C, editor. Computational Methods in Systems Biology. Rovereto: Springer: 2003. p. 127–41.Google Scholar
 17.Leloup JC, Goldbeter A. Modeling the mammalian circadian clock: sensitivity analysis and multiplicity of oscillatory mechanisms. J Theor Biol. 2004; 230(4):541–62.CrossRefPubMedGoogle Scholar
 18.Khalil HK. Nonlinear Systems, Second edn. New Jersey: Prentice Hall; 1996.Google Scholar
 19.Leloup JC, Goldbeter A. Toward a detailed computational model for the mammalian circadian clock. Proc Natl Acad Sci. 2003; 100(12):7051–6.CrossRefPubMedPubMedCentralGoogle Scholar
 20.Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient kmeans clustering algorithm: Analysis and implementation. IEEE Trans Pattern Anal Mach Intell. 2002; 24(7):881–92.CrossRefGoogle Scholar
 21.Casagranda S, Gouzé JL. Principal Process Analysis and reduction of biological models with order of magnitude. IFACPapersOnLine. 2017; 50(1):12661–6.CrossRefGoogle Scholar
 22.Kobilinsky A, Monod H, Bailey RA. Automatic generation of generalised regular factorial designs. Comput Stat Data Anal. 2017; 113:311–29.CrossRefGoogle Scholar
 23.Lamboni M, Monod H, Makowski D. Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models. Reliab Eng Syst Saf. 2011; 96(4):450–9.CrossRefGoogle Scholar
 24.Box GE, Hunter JS. The 2 kp fractional factorial designs. Technometrics. 1961; 3(3):311–51.Google Scholar
 25.Zheng B, Albrecht U, Kaasik K, Sage M, Lu W, Vaishnav S, Li Q, Sun ZS, Eichele G, Bradley A, et al. Nonredundant roles of the mPer1 and mPer2 genes in the mammalian circadian clock. Cell. 2001; 105(5):683–94.CrossRefPubMedGoogle Scholar
 26.Van Der Horst GT, Muijtjens M, Kobayashi K, Takano R, Kanno Si, Takao M, de Wit J, Verkerk A, Eker AP, van Leenen D, et al. Mammalian Cry1 and Cry2 are essential for maintenance of circadian rhythms. Nature. 1999; 398(6728):627–630.CrossRefPubMedGoogle Scholar
 27.Tyson JJ, Hong CI, Thron CD, Novak B. A simple model of circadian rhythms based on dimerization and proteolysis of PER and TIM. Biophys J. 1999; 77(5):2411–7.CrossRefPubMedPubMedCentralGoogle Scholar
 28.Pagel H, Poll C, Ingwersen J, Kandeler E, Streck T. Modeling coupled pesticide degradation and organic matter turnover: From gene abundance to process rates. Soil Biol Biochem. 2016; 103:349–64.CrossRefGoogle Scholar
 29.RoblesRodriguez C, Bideaux C, Guillouet S, Gorret N, Roux G, MolinaJouve C, AcevesLara C. Multiobjective particle swarm optimization (MOPSO) of lipid accumulation in fedbatch cultures. In: 2016 24th Mediterranean Conference on Control and Automation (MED). Athens: IEEE: 2016. p. 979–984.Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.