Background

Mathematical modelling has been used for decades as an approach to understand the functioning of biological systems in terms of their internal processes and components. The latter form complex networks that vary in nature. For instance, biochemical networks include processes controlling the intracellular level of metabolites, RNAs and proteins, which allow cells to live and grow. A process either corresponds to a single biochemical reaction, for example protein phosphorylation, or encompasses many biochemical reactions like those involved in general cell functions (translation of proteins, transcription of RNAs...). In ecological networks, the processes can refer to events influencing the distribution and abundance of organisms, or to fluxes of energy and matter.

Numerous kinetic models of these networks have been developed in computational biology, of increasing complexity due to advances in modelling and parameter estimation approaches (see [1, 2] for an example). Complexity arises from the high dimension of the networks, the large number of biological processes involved and their non linearity due to the complex feedback loops that regulate them.

One approach often used to tackle the problem of complexity is model reduction (see [3] for a recent review). The simplified models are easier to analyse, while retaining the main features of the original ones and their biological significance. Briefly, methods of model reduction shorten the list of network species or of network reactions (e.g. [4, 5]), lump state variables (e.g. [6]) or decompose the system into slow and fast dynamics (e.g. [79]). The often used quasi-steady-state approximation falls in the latter category (e.g. [10]). Other approaches simplify the mathematical functions describing the molecular processes. For instance, piece-wise affine differential equations approximate by step functions the sigmoidal functions used to describe the regulation of gene expression. The dynamics of the simplified system can be easily analysed by means of state transition graphs [11]. However, these simplifications are generally restricted to models of gene expression and are more difficult to apply to other types of networks [12].

Reduction approaches have proven successful to significantly reduce model complexity, but they do not provide a mean to understand how the system dynamics emerges from the cascade of biological processes and regulatory mechanisms at work. This is especially true when the reduced models remain complex, with many coupled equations sharing common processes and involving complex feedback loops. For instance, regulatory mechanisms switch on certain biological processes at some times and off at others. It is thus important for a good understanding of the system behaviour to identify which and when processes significantly influence the system dynamics. In other words, instead of analysing a single reduced model in place of the original one, valid on the whole time interval, we may want to analyse series of simplified models highlighting the important processes of the original model during the periods of time in which they are active.

This is how we address the problem of high dimensional model analysis in this study. We develop a mathematical and numerical approach based on the boolean concept of activity/inactivity. The method, called Principal Process Analysis (PPA), determines the contribution of each biological process to the output of the dynamical system. In models of biological networks, these processes appear in a linear additive manner in each ODE. We first identify the inactive processes and neglect them. In a second step, we treat processes whose activity varies along time: we define time windows in which these processes are either always active or always inactive. We eventually create sub-models for each time window that only contain the active processes. This procedure leads to the simplification of the system to its core mechanisms. The simplified system can be further studied, to understand the role of each active process in the system dynamics.

PPA is a general approach that can be easily applied to any biological system described by ordinary differential equations (ODEs). It shares common features with a model reduction method focusing on major model parameters rather than processes [4], in which parameters that are not required for the system behaviour are removed. Another approach dedicated to chemical reactions identifies and removes chemical species that contribute less to the model output [5]. In this case, the problem is solved using optimization approaches (see also [13]). Despite these similarities, PPA is not a model reduction approach. It provides a mean to access to and dissect the more complex dynamics of the original model through the analysis of simplified versions in given time windows. Results are easily interpretable and do not require additional and complicated computations.

Preliminary work on PPA has been described in an earlier conference paper [14], in which we applied PPA to two ODE models of biochemical networks whose simplification preserved their dynamical behaviour: the model of circadian rhythms in Drosophila [15] and the model of the regulation of the ERK signalling pathway [16]. Questions remained open though, concerning the scalability of the approach and its robustness: to which extent does PPA preserve model dynamics in systems of higher dimension, with many more biological processes involved and including interlocked feedback loops? And since the approach requires a priori knowledge of the parameter values, how sensitive are process activities or inactivities to the value of these parameters? In this study, we address these questions by studying a much more complex model of circadian rhythms in mammals, including 16 variables, 76 processes, and intertwined positive and negative feedback loops [17]. Parameter sensitivity analysis of the global relative error between the original and reduced systems allows us to assess the quality and robustness of our approach.

The paper is organized as follows. “Methods” section describes the principle of Principal Process Analysis as well as global sensitivity analysis. “Model description” section introduces the model of mammalian circadian clock. We apply our approach to this complex model in “Principal Process Analysis of the circadian clock model” to “Influence of parameter values” sections, before concluding in “Conclusions” section.

Methods

We summarize below the basics of the method of Principal Process Analysis. We will use as running example the 14th variable of the mammalian circadian clock model analysed in “Model description” section (see also Appendix B. It describes how the concentration of the nuclear form of protein BMAL1 (B N =x14) changes:

$$ {}\begin{aligned} \begin{array}{ll} \frac{dB_{N}}{dt}\,=\,&\,-\,V_{3B} \frac{B_{N}}{K_{p}+B_{N}}\,+\,\!V_{4B} \frac{B_{NP}}{K_{dp}+B_{NP}} \,+\,k_{5} B_{C} \,-\, k_{6} B_{N} \,-\, k_{7} B_{N} PC_{N}\\ &+k_{8} I_{N} - k_{dn} B_{N}. \end{array} \end{aligned} $$
(1)

Principal Process Analysis (PPA)

Consider the following ODE model of biological network:

$$ \dot{x}=f\left(x,p\right) $$
(2)

where \(x=\left (x_{1},x_{2},\ldots,x_{n}\right) \epsilon \mathbb {R}^{n}\) is the vector of component concentrations, \(x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}\) the vector of their initial values and p\(\epsilon \mathbb {R}^{b}\) the vector of parameters. Each equation is decomposed into a sum of biological processes:

$$ \dot{x}_{i}=\sum\limits_{j}f_{ij}\left(x,p\right) $$
(3)

where f ij represents the jth process involved in the dynamical evolution of the ith variable of the system over a period of time [ t0,T].

Example 1

Equation (1) includes seven processes, each associated with a specific biological function. They take a positive or negative value, depending on whether they affect positively or negatively the variation of BMAL1 concentration. The equation of the protein is rewritten as:

$$ \dot{x}_{14}= f_{14,1}+f_{14,2}+f_{14,3}+f_{14,4}+f_{14,5}+f_{14,6}+f_{14,7} $$
(4)

where \(f_{14,1}= - V_{3B} \frac {B_{N}}{K_{p}+B_{N}},...,f_{14,7}= - k_{dn} B_{N} \).

Figure 1a shows the dynamical evolution of processes f14,1 to f14,7 during a day. Nuclear import of BMAL1 is the fastest process of Eq. (1), while the basal degradation of the protein is the slowest.

Fig. 1
figure 1

Dynamics of processes that change the nuclear concentration of protein BMAL1 (B N , see Eqs. (1) and (4)) over a 24-h time window. a Absolute value of the processes along time (one colour per process). b Weights associated with the processes along time. The threshold δ is set at 0.1

Comparison criteria are needed to weigh the influence of the different processes f ij on the time evolution of each variable x i . There are several alternatives. For instance, we can compare their absolute value (|f ij (x,p)|), scale it by the ith initial condition \(\left (\frac {|f_{ij}(x(t),p)|}{x_{0i}} \right)\), or scale it by the solution of the ith ODE \(\left (\frac {|f_{ij}(x(t),p)|}{x(t)_{i}}\right)\). In this work we associate a relative weight with each process to make it dimensionless:

$$ W_{ij}(t,p)=\frac{|f_{ij}(x(t),p)|}{\sum_{j}|f_{ij}(x(t),p)|} $$
(5)

where 0≤W ij (t,p)≤ 1 and \(\sum _{j}W_{ij}(t,p)=1\).

Definition 1

Let the continuous function f ij (x(t),p) be the jth process of \(\dot {x}_{i}(t)\) in t ε[t0,T] and let the threshold δε [0,1]. We call a process f ij (x(t),p) always inactive when W ij (t,p)<δ∀ t ε [0,T]. We call a process f ij (x(t),p) inactive at time t when W ij (t,p)<δ. We call a process f ij (x(t),p) active at time t when W ij (t,p)≥δ. Switching time for a process f ij (x(t),p) is the time \(t_{ij}^{s}\) at which W ij (t,p)=δ. A process can have 0,1,…,z switching times. The switching time set S i for the ith variable contains all the switching times \(t_{ij}^{s}\) where j=1,..,k and s=1,…,z. The global switching time set S is the union of all S i .

The choice of δ is important, since it determines above which weight a process can be considered active or inactive and, as we will see it later, if the process should be kept or omitted in the simplified model. An excessively high value might lead to an oversimplified model, without many dynamical features of the original model. Conversely a very low value might result in a model insufficiently simplified, which remains too complicated to analyse. From our experience, a convenient value is δε [0, 0.1], where the value of δ can be adjusted to the number of processes. For instance, if an ODE contains numerous processes of similar value, each individual process weighs little. In this case, δ should not be chosen too high to avoid omitting all these processes; it can be inversely proportional to the total number N of processes in the equation: \(\delta \propto \frac {1}{N}\). In this paper, fine-tuning the threshold value is not justified: there are not many processes per equation and they have very different values. We will always take δ=0.1.

Example 2

We apply Eq. (5) to determine the dynamical weight of the seven processes in Eq. (1). Results are shown in Fig. 1b. As expected, the nuclear import, which is the fastest process, weighs more in the dynamical evolution of BMAL1 concentration, while the basal degradation of the protein weighs little. We determine the process activities using δ=0.1:

  • The weight of processes W14,2, W14,6, W14,7 is always below δ: their related processes f14,2, f14,6, f14,7 are thus always inactive;

  • The processes W14,1 and W14,3 are always above δ: f14,1 and f14,3 are active during the whole system dynamics;

  • The weight of processes W14,4 and W14,5 crosses the threshold twice and the switching times \(t_{14,4}^{1}=4.4\)h, \(t_{14,4}^{2}=20.7\)h, \(t_{14,5}^{1}=0.8\)h and \(t_{14,5}^{2}=20.3\)h are collected in the set S14.

Visualization of process activities

Graphical tools turn out to be useful to analyse the dynamical weights of complex systems such as the mammalian circadian clock model. We use three of them in PPA, which are described below.

  • The Boolean Process Map summarizes qualitatively the knowledge of the process activity or inactivity along time for each variable. A black bar means that the process is active, while the white bar indicates an inactive process.

  • Example: The Boolean Process Map in Fig. 2a represents the process activities deduced from the dynamical weights in Fig. 1b. We observe that there is always an active phosphorylation of BMAL1 in the nucleus, while the basal degradation can be considered always inactive. The nuclear export is solely active in the first and last periods of time.

    Fig. 2
    figure 2

    Visual tools. aBoolean Process Map, bDynamical Process Map between times \(t^{1}_{14,4}\) and \(t^{1}_{14,5}\), c3-D Process Map for the variable x14 and its corresponding 2-D version

  • The Dynamical Process Map is a network representation of the process activities. Variables (represented by boxes) are connected by processes (arrows). Three cases arise, which depend on the activity of processes shared by several variables: black-coloured arrows represent processes that are inactive for all variables involved, while active processes are displayed as red arrows. Yellow arrows are used for processes shared by several variables that have different activities: for instance, one process is considered active in one equation, but inactive in another one. Note that the model simplification by elimination of inactive processes, as will be described in “Model simplification by elimination of always inactive processes” section, will have for effect to remove black arrows in the Dynamical Process Map.

  • Example 3 Figure 2 b represents the Dynamical Process Map for x14, the nuclear concentration of Bmal1, in the time interval between \(t^{1}_{14,4}\) and \(t^{1}_{14,5}\). Phosphorylation is an example of active process for the nuclear BMAL1 concentration (see the Boolean Process Map in Panel A). It is shown in red because it is also considered active at the same moment for the other variable sharing this process, the concentration of phosphorylated BMAL1.

  • The 3-D Process Map represents the time-dependent evolution of the intensity of each process. Process activities are averaged per hour, which leads to a discretisation of time. Vertical bars represent process weights for each hour. Their color code represents the intensity of process weights relatively to the other weights.

  • Example 4 Figure 2 c describes the 3-D Process Map for the concentration of nuclear BMAL1. The phosphorylation of the protein, its nuclear import and its consumption for the formation of a large complex are the processes the most active over time.

Model simplification by elimination of always inactive processes

Eliminating processes that play a minor role in the system dynamics facilitates the analysis of large models. Since in the previous steps of PPA we have determined the process activities in system (2), we now neglect processes that are considered always inactive. This will give us g(xr), the function approximating f(x) in (2) with less processes.

We thus introduce the ODE system (6), which approximates system (2):

$$ \dot{x}^{r}=g\left(x^{r},p^{r}\right) $$
(6)

where \(x^{r}=\left (x_{1}^{r},x_{2}^{r},\ldots,x_{n}^{r}\right) \epsilon \mathbb {R}^{n}\) is the vector of component concentrations, \(x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}\) the vector of their initial values, and \(p^{r} \epsilon \mathbb {R}^{c}\), where cb is the vector of parameters. The model simplification approach relies basically on the following theorem: if the vector fields of two systems are close (f(x)≈g(x)), then the solutions of the original and approximated systems are close during some time interval under the assumptions on the Lipschitz conditions listed in [18, p. 79, Th. 2.5].

Based on the dynamical weights determined in “Principal Process Analysis (PPA)” section and the threshold value δ, we apply the following rule to define g(xr,pr): if W ij (x(t),p) < δ∀ t ε [ t0,T]theng ij = 0; if not, g ij f ij .

We thus define xr as an approximation of x and pr as a subset of p.

Example 5

We proceed to the simplification of processes in Eq. 1. Because f14,2, f14,6, f14,7 are always inactive, g14,2=0, g14,6=0, g14,7=0 and g14,1f14,1, g14,3f14,3, g14,4f14,4, g14,5f14,5. The resulting ODE for \(x^{r}_{14}\) is:

$$ \frac{dB^{r}_{N}}{dt}=-V_{3B} \frac{B^{r}_{N}}{K_{p}+B^{r}_{N}}+ k_{5} B^{r}_{C} - k_{6} B^{r}_{N} - k_{7} B^{r}_{N} PC^{r}_{N}. $$
(7)

Note that Principal Process Analysis is applied to each ODE separately. As a consequence, processes shared by two equations can be active in one equation, but inactive in the other. Elimination of the inactive processes breaks mass balance in the simplified model. For our purpose, this is not an issue: the simplification does not aim at reducing the model, but rather analysing a sub-model of the original one, which describes the dynamics of the important phenomena.

It is interesting to quantify the extent to which the simplified system (6) preserves the behaviour of the original one. This gives a better sense of how the active processes kept in the simplified model are responsible for the dynamics of the original system. In addition, this helps identifying potential problems related to the model simplification, for instance involving a wrong choice of the δ value. One can also imagine pathological cases, when the simplified system does not reproduce the main dynamical features of the original model: for instance, if it evolves towards a different basin of attraction or if the removal of a consumption term does not compensate a synthesis term any more, leading the simplified system to explode in finite time. It is non nonsensical in all these cases to analyse simplified models that behave so differently from the original ones. The δ threshold should be adjusted to a new value and Principle Process Analysis re-run until model simplification proves satisfactory according to the criteria described below.

We present in Appendix A an a priori analysis of the error made when removing some inactive processes. This analysis gives a theoretical, but very conservative, bound on the error. In practice, we numerically compute the global relative error between the original and simplified models. Several forms of error are possible. We have chosen the following one, analysed over a period of time [ t0,T], in which y h and \(y^{r}_{h}\) are the hth outputs of the original and simplified systems, respectively:

$$ e_{h}=\frac{\int_{t_{0}}^{T}|y_{h}(t)-y_{h}^{r}(t)|dt}{\int_{t_{0}}^{T}|y_{h}(t)|dt}. $$
(8)

How to choose the model outputs? They can correspond to all model variables or combinations of them, if the latter are involved in some biological phenomena of interest for instance. In the case of the circadian clock model, six variables were specifically studied in the original papers [17, 19], which we will use as outputs to determine the global relative error between the original and simplified models: the concentrations of Per mRNA (M P ), Cry mRNA (M C ), Bmal1 mRNA (M B ), total PER protein (P Tot ), total CRY protein (C Tot ) and total BMAL1 protein (B Tot )Footnote 1.

Creation of sequences of sub-models

In the previous step of PPA, the models are simplified by elimination of always inactive processes. Here we go one step further in the simplification, by eliminating processes that are inactive at times. This is achieved by decomposing the period of time during which the system evolves into time intervals. To that end we use the switching times t b (with b=1,…,d) determined in “Principal Process Analysis (PPA)” section: this allows creating a succession of sub-models for each time interval, which contain the core mechanisms in that period of time.

To avoid creating large sequences of sub-models, we reduce the number of time windows by grouping proximal switching times with the easy-to-compute k-means clustering [20]. Hence the dswitching times included in the global switching time set S=[t1,t2,...,t d ] are grouped into z (≤d) clusters C= {C1,C2,...,C z }, so as to minimize the within-cluster sum of square (or within-cluster inertia):

$$ \mathrm{argmin_{C}}\sum\limits_{v=1}^{z}\sum_{t\epsilon C_{v}}||t-\mu_{v}||^{2} $$
(9)

where μ v is the mean of the switching times in C v . The consequence is that processes with switching times belonging to cluster C v are assumed to switch together at the same time \(t^{r}_{v}=\mu _{v}\), the mean switching time in cluster C v .

How to define the right number of clusters? A too large number of clusters will result in a low error, but also in numerous time windows that make the simplified models still too complex to analyse. Equation (10) describes how to take into account this trade-off between the number z of clusters and the error. It is related to the difference between the maximum and the minimum number of active processes during the temporal evolution of the system: if this difference is low, z should be chosen low as well. We thus define z, rounded to the nearest number, as:

$$ z = \frac {\max\limits_{v}(n^{v}_{act})-\min\limits_{v}(n^{v}_{act})}{2}, $$
(10)

where \(n^{v}_{act}\) denotes the number of active processes in the vth time window.

We eventually end up with a sequence of z+1 sub-models in the time interval [ 0,T], the first one being valid in \(\left [0,t^{r}_{1}\right ]\) and the last one, in \(\left [t^{r}_{z},T\right ]\).

Similarly to the global errors determined in “Model simplification by elimination of always inactive processes” section, we can also assess how the newly simplified models reproduce the dynamical behaviour of the original model in each time window \(\left [t^{r}_{v-1},t^{r}_{v}\right ]\), by measuring the error:

$$ e^{v}_{h}=\frac{\int_{t^{r}_{v-1}}^{t^{r}_{v}}|y_{h}(t)-y_{h}^{r}(t)|dt}{\int_{t^{r}_{v-1}}^{t^{r}_{v}}|y_{h}(t)|dt}. $$
(11)

We compute the error (11) between the original model and each sub-model, with or without propagating errors: in the first case, for each time window \(\left [t^{r}_{v-1},t^{r}_{v}\right ]\) (v=1,…,z+1 with \(t^{r}_{0}=t_{0}\) and \(t^{r}_{z+1}=T\)), the initial values of the h outputs of sub-model SM v are equal to the final values at \(t^{r}_{v-1}\) of sub-model SMv−1; in the second case, they are equal to the values of the original model at \(t^{r}_{v-1}\).

Global sensitivity analysis

Principal Process Analysis is applied to models with given parameter values and initial conditions. It may be questioned whether the uncertainty of their values influences the simplification of the model and thus, the analysis of the system dynamics. While we have shown PPA to be robust to variations of initial conditions in [21], the question remains open for parameter values.

To that aim, we perform global sensitivity analyses on the global relative errors between the original model and the reduced model (defined in Eq. (11)). Such an analysis consists in quantifying the parameter influence on the error, while varying the parameters simultaneously in given ranges. In contrast, in a local sensitivity analysis, parameters would vary one-at-a-time in the neighbourhood of their nominal value. First, we perform an analysis on each of the six errors defined for the six model outputs \( \left (e^{v}_{M_{P}},e^{v}_{M_{C}},e^{v}_{M_{B}},e_{P_{Tot}},e^{v}_{C_{Tot}},e^{v}_{B_{Tot}}\right)\). Then, in a more detailed analysis, we compute the global relative error for each state variable, according to Eq.(11) (with y h =x i ,i=1,…,16); sensitivity analyses are also performed on each of these 16 errors. The method used is based on factorial design [22], analysis of variance (ANOVA) and principal component analysis (PCA) [23].

We first explore the parameter space using a factorial design. We vary N f =51 parameters of the model [17] (see “Model description” section). We choose N l =2 levels for each parameter p f (or factor): \(p^{-}_{f}= 0.8\: p_{f}\) and \(p^{+}_{f}=1.2\: p_{f}\). A full factorial design, defined as all possible combinations of the parameter levels, would be necessary to estimate the main effects and interactions of all parameters. Such a full design corresponds to \(N_{l}^{N_{f}}=2^{51}\) parameter combinations and would necessitate the same number of model simulations to compute the corresponding outputs, which are far too many. Thus we implement a fractional factorial design [24], which is a subset (fraction) of the full design of size \(N_{j}<N_{l}^{N_{f}}\). The design is determined according to a given statistical model linking the error e h to the parameters p f , for each time window \(\left [t^{r}_{v-1},t^{r}_{v}\right ]\). We choose a second order linear model, which incorporates all main effects and two-way interactions as follows:

$$ e^{v}_{h,j}=\mu^{v}_{h}+\sum\limits_{f=1}^{N_{f}}\alpha^{v}_{h,f(j)}+\sum\limits_{f=1}^{N_{f}}\:\sum\limits_{k=1, k\neq f}^{N_{f}}\beta^{v}_{h,f(j)k(j)}+\epsilon^{v}_{h,j}, $$
(12)

where \(e^{v}_{h,j}\) is the error computed according to Eq. (8) for output (or state variable) h, time window v, and parameter combination j (j=1,…,N j ) of the fractional factorial design; \(\mu ^{v}_{h}\) is the grand mean; \(\alpha ^{v}_{h,f(j)}\) is the main effect of parameter p f for parameter combination j; \(\beta ^{v}_{h,f(j)k(j)}\) is the interaction effect between parameters p f and p k (kf) for parameter combination j; and \(\epsilon ^{v}_{h,j}\) is the residual. Each main effect \(\alpha ^{v}_{h,f(j)}\) can take two values, according to the level of parameter p f in combination j: \(\alpha ^{v}_{h,f^{+}}\) or \(\alpha ^{v}_{h,f^{-}}\). Similarly, each two-way interaction effect can take four values: \(\beta ^{v}_{h,f^{+}k^{+}},\: \beta ^{v}_{h,f^{+}k^{-}},\: \beta ^{v}_{h,f^{-}k^{+}},\: \beta ^{v}_{h,f^{-}k^{-}}\). The fractional factorial design determines the parameter combinations needed to estimate all main effects and two-way interactions. It is obtained using R package planorFootnote 2 and consists of N j =212 parameter combinations, yielding as many simulations. According to the sparsity-of-effects principle, a system is usually dominated by main effects and low order interactions, so neglecting third-order and higher interactions can still provide good estimates.

An ANOVA is then performed on these simulations, for each error e h . It consists in estimating the grand mean, main effects and interaction terms of model (12), using a least-square criterion to minimise the residuals. It is based on the following variance decomposition:

$$ \underbrace{\sum\limits_{j=1}^{N_{j}}\left(e^{v}_{h,j}-\overline{e^{v}_{h}}\right)^{2}}_{SS_{T}^{h,v}} = \underbrace{\sum\limits_{j=1}^{N_{j}}\left(\widehat{e^{v}_{h,j}}-\overline{e^{v}_{h}}\right)^{2}}_{SS_{M}^{h,v}} + \underbrace{\sum\limits_{j=1}^{N_{j}}\left(e^{v}_{h,j}-\widehat{e^{v}_{h,j}}\right)^{2}}_{SS_{r}^{h,v}=\sum_{j}(\epsilon^{v}_{h,j})^{2}}, $$
(13)

where \(\overline {e^{v}_{h}}\) is the mean error computed over all N j simulations of the fractional factorial design; and \(\widehat {e^{v}_{h,j}}=\widehat {\mu ^{v}_{h}}+\sum _{f}\widehat {\alpha ^{v}_{h,f(j)}}+\sum _{f}\sum _{k\neq f}\widehat {\beta ^{v}_{h,f(j)k(j)}}\) (\(\widehat {~}\) denoting an estimated value) is the error estimated from the linear model (12) for parameter combination j. The total sum of squares \(SS_{T}^{h,v}\) is split into the sum of squares attributed to the model \(SS_{M}^{h,v}\) and the residual sum of squares \(SS_{r}^{h,v}\), the latter corresponding to the criterion that is minimised. In turn, \(SS_{M}^{h,v}\) is split into sum of squares attributed to each main effect \(\alpha ^{v}_{h,f}\) and two-way interaction term \(\beta ^{v}_{h,fk}\), denoted respectively \(SS_{f}^{h,v}\) and \(SS_{fk}^{h,v}\). The total sensitivity index of parameter p f is then defined as follows:

$$ tSI_{f}^{h,v}=\frac{SS_{f}^{h,v}+ \sum_{k\neq f} SS_{f,k}^{h,v}}{SS_{T}^{h,v}}. $$
(14)

Noting that the variance of error \(e^{v}_{h}\) computed over all N j simulations of the fractional factorial design is \(\sigma ^{2}_{e^{v}_{h}}=\frac {1}{N_{j}-1}SS_{T}^{h,v}\), the total sensitivity index \(tSI_{f}^{h,v}\) represents the fraction of the variance explained by parameter p f . As an ANOVA requires a scalar variable, separate sensitivity analyses are performed for each scalar error \(e^{v}_{h}\) and separate indices are computed for each error \(e^{v}_{h}\). To compare the parameter influence on the different errors \(e^{v}_{h}\), we use non normalised indices, obtained by multiplying each \(tSI_{f}^{h,v}\) by the variance of the error:

$$ tSI_{f}^{h,v'}=\sigma^{2}_{e^{v}_{h}} tSI_{f}^{h,v}. $$
(15)

To obtain synthetic sensitivity indices that represent the influence of each parameter on the errors for all 16 state variables, we decompose the error vector (e h :h=1…,16) by PCA (without normalisation). As a result, an inertia proportion ω c can be attributed to each component c (a component is a linear combination of the 16 errors e h ). It represents the variability among all simulations carried by this component. Only the N c first components whose cumulated inertia add up to 95% or more are retained. Moreover, each simulation is given a score on each component, a scalar representing the projection of the simulation on the component. Then, for each component retained, an ANOVA is performed on the scores and total sensitivity indices \(tSI^{c}_{f}\) are computed, as described in Eq. (14). Finally, a total generalised sensitivity index is calculated for each parameter p f as the sum of the total sensitivity indices on each component, weighted by the inertia of the component:

$$ tGSI_{f}= \sum\limits_{c=1}^{N_{c}} w_{c} \: tSI^{c}_{f}. $$
(16)

We use the multisensi R packageFootnote 3 for this analysis.

In what follows, we show how Principal Process Analysis can help with the analysis of complex biological models. We apply the approach to a model of the circadian clock developed in [17, 19], which we describe in the following section.

Results

Model description

Periodic fluctuations of the environment subject living organisms to biological rhythms. The latter are endogenous by nature, but entrained by environmental variations. For instance, circadian rhythms are generated by a molecular clock within cells, which synchronizes daily physiological variations to the day-night alternance. The model we study here describes the circadian clock in mammals [17, 19]. In this model, the clock forms a complicated network of intertwined positive and negative feedback loops involving four clock genes: Per, Cry, Bmal1, and Clock. Their mRNA and protein produce sustained oscillations with a period of 24 hours. Light affects expression of gene Per at the transcriptional level: the first twelve hours of day light increase its transcription rate (up to 1.8 [ μM/h]), while it is lowered in the next twelve hours of darkness (down to 1.5 [ μM/h]). The system functions as follows (for the complete schema, see Fig. 3):

  • Transcription of genes Per, Cry and Bmal1 occurs in the nucleus. The newly synthesized mRNAs are exported to the cytosol.

    Fig. 3
    figure 3

    Schematic representation of the mammalian circadian clock. Light stimulates the transcription of gene Per. The complex CLOCK-BMAL1 inhibits the transcription of gene Bmal1 and activates the transcription of genes Cry and Per. Notations: : degradation product; the different forms of a given protein are noted cyto: cytosolic form, nuc: nuclear form; P: phosphorylated form

  • In the cytosol, the mRNAs can be either degraded or translated into proteins, which ones are subsequently phosphorylated (the process is reversible). Unphosphorylated proteins PER and CRY form the complex PER-CRY, which reversibly enters the nucleus. The nuclear and cytosolic forms of the complex can be phosphorylated. Likewise, protein BMAL1 is reversibly phosphorylated and reversibly enters the nucleus, but sole its unphosphorylated form makes a complex with protein CLOCK. Phosphorylated proteins and complexes in the nucleus or the cytosol are subject to degradation.

  • In the nucleus, the complex CLOCK-BMAL1 activates the transcription of Per and Cry genes. Activation is stopped by binding of the PER-CRY complex to CLOCK-BMAL1, which indirectly inhibits Per and Cry transcription.

  • The concentration of CLOCK protein is not a variable in the model because it is constitutively expressed at high levels and considered to be not limiting [17].

The 16 model equations, 56 parameter and 16 initial condition values are shown in Appendix B. The model dynamics is difficult to analyse though, as the circadian clock involves numerous processes, including interlocked positive and negative feedback loops responsible for the oscillatory behaviour of the clock proteins. Reducing the original model around its core active processes can facilitate the model analysis, without changing significantly the original dynamics, in particular the sustained oscillations of the solutions.

Principal Process Analysis of the circadian clock model

We apply Principal Process Analysis to identify major processes of the circadian clock model. To this end we decompose each ordinary differential equation in processes, as shown in Eq. (4) for BMAL1. Each process has a biological interpretation and corresponds to a regulatory mechanism or a biochemical reaction.

We then calculate the relative weight of each process using Eq. (5) and the threshold value δ=0.1.

We collect the switching times (values given in Appendix C) and then build a Boolean Process Map to visualize the activity/inactivity of each process, shown in Fig. 4. We simplify the model by neglecting 24 out of 76 processes, which are always inactive (32% of all processes). They correspond to mRNA and protein basal degradations; cytosolic dephosphorylations of CRY, BMAL1, and PER-CRY; PER-CRY-CLOCK-BMAL1 dissociation in the nucleus; and BMAL1 dephosphorylation in the nucleus. The list of neglected processes is shown in Appendix D.

Fig. 4
figure 4

Activity of the 76 model processes during a 24-h period. Processes are listed in the first column (white background), ordered by variable (blue background). Their activity is depicted in the second column between 0 and 24 h: a horizontal black, resp. white, bar when the process is active, resp. inactive. Values for the switching times are given in Appendix B

We now determine the global relative errors between the original and the simplified model using Eq. (8) for all six outputs (see Table 1). The dynamics of the two models are compared in Fig. 7a. The simplified model preserves qualitatively the trend of the original solutions, as well as their sustained oscillations. The most noticeable difference concerns the peak of the total concentration of protein PER (P Tot ), which corresponds also to the highest error in Table 1 (26.48%): the peak is lower with the reduced model, which also explains the delay between the original and the simplified solutions. Nevertheless the simplified model reproduces qualitatively the oscillatory behaviour of protein PER observed in the original model. The concentrations in the original and simplified models peak at almost the same time. These global relative errors do not call for an adjustment of the threshold value δ. In the next section, we proceed to the second step of the Principal Process Analysis.

Table 1 Global relative errors between the original and reduced models for the six outputs

Creation of sub-models

The simplified model obtained above can be further reduced if we also neglect processes that are sometimes inactive during the system dynamics. Based on the Boolean Process Map and the collected switching times, we identify between 38 and 45 active processes along time (Fig. 5) and a total of 46 switching times (see Fig. 6a). Clustering the switching times into 4 clusters (Fig. 6b) allows us to generate the five sub-models described below. The number of clusters has been chosen according to Eq. (10).

Fig. 5
figure 5

Evolution of the number of active processes as a function of time. The function increases or decreases at switching times, listed in Appendix B

Fig. 6
figure 6

Switching time clustering. aswitching times t b , b=1,...,46 (also listed in Appendix B). b the four switching time clusters (red, green, pink, black) obtained by the k-means method. c the four reduced switching times (\(t_{v}^{r}, v=1,...,4\)), corresponding to the mean switching time within each cluster

Fig. 7
figure 7

Model outputs along time for: a the original model (solid lines) and the reduced model (dashed lines); b the original model (solid lines) and the coupled sub-models without propagation errors (dashed lines); c the coupled sub-models, with (dashed lines) and without (solid lines) propagation errors. d The equations for the total concentration of protein PER (P Tot ), CRY (C Tot ) and BMAL1 (B Tot ) are: P Tot =P C +P CP +PC C +PC N +PC CP +PC NP +I N , C Tot =C C +C CP +PC C +PC N +PC CP +PC NP +I N , B Tot =B C +B CP +B N +B NP +I N

  • SM1, valid from \(t_{0}^{r}=0\) to \(t_{1}^{r}=0.9\) h: neglected processes for this model are always inactive (32% of the total). This model corresponds to the simplified model obtained in “Principal Process Analysis of the circadian clock model” section.

  • SM2, from \(t_{1}^{r}=0.9\) h to \(t_{2}^{r}=6\) h: 46% of the processes are neglected. In addition to the always inactive listed in “Principal Process Analysis of the circadian clock model” section, we have the following inactive processes in this model: cytosolic dephosphorylation of PER, CRY, and PER-CRY; cytosolic dissociation of PER-CRY; nuclear dephosphorylation of PER-CRY; PER-CRY export from the nucleus; and formation of the large complex PER-CRY-CLOCK-BMAL1.

  • SM3, from \(t_{2}^{r}=6\) h to \(t_{3}^{r}=12.5\) h, in which 50% of processes are neglected. In addition to the processes listed in “Principal Process Analysis of the circadian clock model” section, inactive processes are in this case: transcription of Per and Cry mRNAs; cytosolic phosphorylations and dephosphorylations of PER and CRY; cytosolic dephosphorylation of PER-CRY; nuclear phosphorylation and dephosphorylation of PER-CRY; and nuclear export of BMAL1.

  • SM4, from \(t_{3}^{r}=12.5\) h to \(t_{4}^{r}=20\) h, which neglects 42% of processes. The processes include the processes listed in “Principal Process Analysis of the circadian clock model” section, as well as: PER and CRY translation; formation of the PER-CRY complex in the cytosol; PER-CRY dephosphorylation in the cytosol and the nucleus; and export of BMAL1 from the nucleus.

  • SM5, from \(t_{4}^{r}=20\) h to \(t_{5}^{r}= 24\) h, in which 46% of the processes are neglected. With the processes listed in “Principal Process Analysis of the circadian clock model” section, other neglected processes are: cytosolic dephosphorylation of PER and CRY; PER-CRY dissociation in the cytosol; export of PER-CRY; PER-CRY dephosphorylation both in the cytosol and the nucleus; and PER-CRY-CLOCK-BMAL1 formation.

See also Appendix D for the list of neglected processes in each sub-model.

Table 2 gives the global relative errors (11) without propagation error, between the original model and the sub-models for the six outputs and for each time window. Figure 7b illustrates the six model outputs for the original model and the sub-models without propagation errors, while Fig. 7c compares the coupled sub-models with and without propagation error.The simplified models preserve the oscillatory behaviour of the total concentrations of PER, CRY, and BMAL1, albeit with some discrepancies in the amplitude of the oscillations. It is in the third time window that the approximated solution differs the most from the original one (Table 2). This is visible in Fig. 7b in the third time window where the total concentrations of PER and CRY form a much higher peak in the solution of the simplified model. Recall that this error is not an issue, since our objective is primarily the qualitative analysis of the model. It is sufficient that the remaining processes in the simplified model produce a dynamical behaviour qualitatively similar and relatively close to the original model. This shows their important contribution to the system dynamics.

Table 2 Global relative error between the original model and each sub-model without propagation error for the six outputs

Applying a Dynamical Process Map to the third sub-model (Fig. 8; see also “Visualization of process activities” section) shows that the transcription of Per and Cry genes is inactive (black arrow) and that both PER and CRY phosphorylations in the cytosol and in the nucleus are not entirely active (they are not active for all the variables in which they are involved, yellow arrow). In the other time windows these processes are always entirely active (red arrows). This probably explains why we had an higher error in Table 2 for the variable M P , M C , P Tot and C Tot in SM3. The global sensitivity analysis, presented in the next Section, will confirm the validity of this assumption.

Fig. 8
figure 8

Dynamical Process Map for the third time window. Variables (boxes) and processes (arrows) are represented, as well as the process activities: inactive (black); active for all variables involved (red); active for some variables involved (yellow)

Since the dynamics of the coupled sub-models remain close to the original one, we can further analyse the behaviour of the network simplified to its core processes. We use the Dynamical Process Maps for the different sub-models (Appendix E), together with the process activities in Fig. 4 and the model outputs in Fig. 7. The simplified models preserve the three main interlocked feedback loops described in the original model, one positive and two negative loops. The functioning of these loops is directly affected by changes of process activities. Among the two negative feedback loops, which one is the main oscillator? One negative feedback loop involves the inhibition of Bmal1 transcription by the nuclear form of BMAL1 associated to the protein CLOCK. If this mechanism is the main source of oscillations, we should observe wide changes in process activities controlling BMAL1 levels. The total concentration of the protein does not vary much in amplitude (Fig. 7). It mainly decreases in SM2 and SM3, when the concentration of PER-CRY is also high and forms a complex with CLOCK-BMAL1, which is subsequently degraded. This degradation process is active most of the time (Fig. 4 and Appendix E), but variations of the total BMAL1 concentration do not modify strongly the transcription of Bmal1 mRNA, which remains always active. As well, the other processes of translation, phosphorylation and degradation for this variable almost never switch between inactive and active states over time (Fig. 4 and Appendix E). Overall, this suggests that the negative feedback loop involving CLOCK-BMAL1 is not the main oscillator. A similar conclusion was drawn for the original model in [17].

The other negative feedback loop inhibits Per and Cry transcription through the titration of CLOCK-BMAL1 by PER-CRY to form the inhibitory complex PER-CRY-CLOCK-BMAL1. The total concentration of BMAL1 peaks before that of PER and CRY, as can be seen in Fig. 7 for SM2 and SM3. When its concentration is maximal in SM1 and SM2, the nuclear form of the protein associated to the protein CLOCK stimulates the transcription of Per and Cry genes, in conditions where light has also a stimulatory effect on the transcription of these two genes. The processes of transcription and translation of Per and Cry are active in both models, as a result of which levels of PER and CRY raise to reach their maximal concentration in SM3. As can be seen from the process activities in Fig. 4 and the Dynamical Process Maps in Appendix E, conditions are favourable for the accumulation of high levels of complexes PER-CRY and CLOCK-BMAL1-PER-CRY in the nucleus. For instance, numerous processes decreasing PER, CRY and PER-CRY concentrations in the cytosol and the nucleus are inactive: their phosphorylation is reduced (the process is inactive for the dephosphorylated forms but still active for the phosphorylated ones), which limits their degradation, and the nuclear import of PER-CRY is always active. During the same period of time, the formation of the large complex CLOCK-BMAL1-PER-CRY, which is active for both CLOCK-BMAL1 and PER-CRY (Fig. 4 and Appendix E), suggests that the nuclear forms of PER-CRY and CLOCK-BMAL1 bind as soon as they accumulate in the nucleus. The large complex is immediately degraded since its degradation process is always active and its dissociation, always inactive.

In SM2 and SM3, the degradation of the large complex is not compensated for by other mechanisms allowing BMAL1 accumulation in the nucleus: the cytosolic form of the protein is actively phosphorylated and then degraded, while its dephosphorylation is inactive, which reduces the quantity of protein to be imported in the nucleus (see Fig. 4 and the Dynamical Process Maps in Appendix E). In this compartment, the absence of active dephosphorylation, together with the active protein phosphorylation, also contribute to decrease pools of CLOCK-BMAL1 complexes (Fig. 4, Appendix E). This halts transcription of Per and Cry mRNAs in SM3 (the processes are inactive and light is also switched off towards the end of SM3). This also affects the translation of PER and CRY, which becomes inactive in SM4. Altogether these observations suggest that the negative feedback loop inhibiting Per and Cry transcription via the complex CLOCK-BMAL1-PER-CRY is the main source of circadian oscillations. This is consistent with conclusions in [17], where a second oscillator based on the auto-inhibition of BMAL1 has been obtained for specific parameter values only. These results are also consistent with the observation of arrhythmic behaviours in mutant mice with double knock-out of the Per and Cry genes [25, 26].

The positive feedback loop activates Per and Cry transcription through a control of protein stability mediated by the phosphorylation processes. In the model, sole the phosphorylated forms of the proteins are degraded. We observed that the reversible phosphorylation reactions are often displaced in the forward sense, as dephosphorylation processes are often found inactive. In particular, they contribute to decrease the concentration of PER, CRY and PER-CRY, which also diminishes the concentration of the large complex CLOCK-BMAL1-PER-CRY and thus relieves the inhibition exerted by the complex on transcription of Per and Cry genes. Kinetic modelling of the circadian clock in Drosophila has shown the importance of this positive feedback loop for circadian rhythms [27].

Influence of parameter values

In order to check the robustness of the five sub-models, we perform a global sensitivity analysis on the output errors for each time window \(\left (e^{v}_{h}\right)\). We perform the analysis without propagating the errors because each sub-model is valid for a specific time window, independently from the other time windows. We vary 51 among the 56 parameters of the model: the Hill coefficients m and n are kept fixed because they represent the degree of cooperativity in gene repression/activation, while k stot , v stot , V phos are function of other parameters (see Appendix B). We hence compute the non normalised total sensitivity indices for all parameters according to Eq. (15) (see Fig. 9, first column). Because the last three outputs (P Tot ,C Tot ,B Tot ) are the sum of model variables that interact, some processes have no impact on these outputs and the information on the parameter influence is lost. We also perform the global sensitivity analysis on the 16 global relative errors between the original model and the sub-model variables without propagating errors (see Fig. 9, second column). The complex PER-CRY plays an important role in every time window: its variability is due mostly to its maximal phosphorylation velocity (V1PC) and its degradation parameter (v dPCC ). In the third and fourth time window the other important variation is due to the CRY protein: in SM3 the variation is mostly due to the binding constants in the transcription of Per and Cry mRNAs (K AP and K AC ) and in SM4, to the maximal translation rate of BMAL1 (k sB ) that stimulates Per and Cry mRNA transcription. In the last time window, lots of variables contribute to the system variation: the most important parameter for the variability of the outputs is the maximal velocity of BMAL1 phosphorylation in the nucleus (V3B).

Fig. 9
figure 9

Global sensitivity analysis on the output (left column) or variable (right column) errors between the original model and the sub-models without propagation error for each time window (lines). Non-normalised total sensitivity indices are represented for each error (one bar per error) and for: (i) the 10 most influential parameters (color-coded); (ii) the remaining parameters (white). The residual is also represented (grey). For the biological meaning of the variables in the second column, see the equations in

To get a more global view of the model simplification, we calculate, for each parameter combination and for each time window, the average error (averaged over the 16 variables) between the original model and the sub-model variables as follows:

$$ \bar{e}^{v}=\frac{1}{16}\sum\limits_{i=1}^{16} e^{v}_{i}. $$
(17)

Results are shown in Fig. 10. The variability is higher in the third and four sub-model, although the difference between the lower and upper quartiles is low in all sub-models.

Fig. 10
figure 10

Average error between the original model and the sub-model variables calculated in each time window according to Eq. (17). Variability (box-plots) within each sub-model (or time window) is due to the various parameter combinations designed for the sensitivity analysis

Then, for each time-window, we compute the total generalised sensitivity indices according to Eq. (16), which represents the fraction of error variability explained by each parameter when parameter values vary. The results are shown in Fig. 11: we obtain similar results to the ones in Fig. 9 (column 2): in SM1 and SM2, the maximal phosphorylation velocity (V1PC) and degradation (v dPCC ) of PER-CRY complex play the main role; in SM3, the binding constants of Per and Cry proteins (K AP and K AC ); in SM4, the translation of BMAL1 protein (k sB ) and in SM5, the maximal phosphorylation velocity of BMAL1 protein in the nucleus (V3B). In order to check whether the error variations between the original model and the sub-models are due to parameters appearing in neglected processes, we determine the following ratio: \(R^{v}_{h}=\frac {\sum \limits _{f\in \{ inactive~processes \}}tGSI_{f}^{h,v}}{\sum \limits _{f}tGSI_{f}^{h,v}}\). We only use the 10 most informative parameters, with higher tGSI, as they explain most variability. We choose a conservative option: if a parameter is neglected in an inactive process but still appears in other active processes, we still consider that it belongs to the neglected process parameters (worst case). Results are shown in Table 3. In most time windows, the variability is mainly due to parameters still contained in the reduced sub-models, i.e. the parameters of the active processes. In the third time-window, however, parameters appearing in neglected processes generate more than 50% of the variability. It is consistent with Fig. 7b: the peaks of the total concentration of PER and CRY are overestimated by the sub-model and some of the most important parameters that lead to the output variability for this time window are the translation rate of PER and CRY proteins, the maximal phosphorylation velocity of PER-CRY complex in the cytosol and nucleus (as it has been shown in Fig. 11). This confirms what we have supposed when applying the Dynamical Process Map to SM3 (see discussion about Fig. 8 at the end of “Creation of sub-models” section).

Fig. 11
figure 11

Generalised sensitivity indices (GSI) computed for each sub-model on the errors between the original model and the sub-model variables. The 10 most influential parameters on the errors are retained: main effect (grey bar) and total GSI (black bar)

Table 3 Percentage of tGSI for parameters contained in inactive processes

Discussion

A challenging task when analysing the dynamics of biological networks is to understand the relation between the network behaviour and its numerous processes, the activity of which is switched on and off by regulatory mechanisms. Model reduction is one possible approach to deal with model complexity and help deciphering the design principle of these networks. However, the reduced models can be still too complex to study and this does not answer the question on the role of each individual process in the network behaviour. Ideally, one would like to identify the major processes, quantify and then understand their contribution to the system dynamics. Principal Process Analysis was developed with this objective in mind, and with the final goal of simplifying the original model in one or several sub-models around core active processes that are responsible for the dynamics of the original system. The dynamics of the core processes is much more tractable in the sub-models than in the original one. Questions remained open though concerning the scalability and robustness of this approach.

In this paper we tested the scalability of Principal Process Analysis by applying the approach on a model of high dimension, the mammalian circadian clock model, which incorporates numerous processes and complex interlocked feedback loops responsible for oscillatory behaviours. Simplification of the original system dynamics to as much as 50% of its processes in five coupled sub-models helped us relate the dynamics of the simplified models to the system components and their active interactions. We hence observed that the negative feedback loop controlling Per and Cry transcription through the formation of the large complex PER-CRY-CLOCK-BMAL1 is the main oscillator, in agreement with previous experimental and modelling studies [17, 25, 26]. Principal Process Analysis has been also applied with success to diverse biogeochemical and biochemical models in our group and elsewhere, see [14, 21, 28, 29]. These case studies and the present one exemplify the applicability and scalability of the approach to models of diverse nature and and complexity.

In this paper, the quantification of global errors allowed us to conclude that the simplified models reproduce well the behaviour of the original ones. Even in the case of the largest errors observed on the model output, did the simplified models preserve the oscillations of the clock proteins. Since Principal Process Analysis is based on the a priori knowledge of the model parameters, it was important to assess the robustness of the approach to uncertainties on these parameter values. Through a global sensitivity analysis, we studied the impact of of parameter values on the error between the original model and the simplified sub-models. Not only was the variation of the error small, but it was mostly due to parameters of the neglected processes. With this analysis, we proved the robustness of PPA to parameter uncertainty. In addition, we provided clues to identify and solve potential troubles related to the model simplification, in order to decrease errors between the original model and the simplified sub-models.

In a recent study, we also showed by other methods the robustness of PPA to initial conditions [21]. The latter were supposed to lie in rectangles contained in a region of the variable space varying by one order of magnitude in each coordinate. Under additional assumptions on the monotonicity of biological processes within rectangles, the maximal bound of process weights was computed, which allowed identifying active processes in each rectangle, similarly to “Principal Process Analysis (PPA)” section, for which weight is above the threshold value δ. Based on the behaviour of processes on the edges of the rectangles, it was then possible to determine the transitions between rectangles and deduce the evolution of process activities along the different transitions. The method has been applied on a small gene expression network containing a negative feedback loop [21]. The same principles could be applied to show the robustness of the model to larger variations of its parameters. In this case, the parameter space should be divided in rectangles in which the activity/inactivity of processes is studied. Such extension of the method is part of a future work.

In the current state of development, Principal Process Analysis is not a model reduction approach. For instance, the elimination of the inactive processes from the original model breaks down the mass conservation relations when eliminating a process in one equation that is considered active in others. As long as the purpose of PPA is to analyse the important processes in the original model, this is not an issue. Nevertheless, the approach could be extended so as to preserve mass conservation relations. In addition, simplified models with much smaller global relative errors could be obtained, so that the simplified sub-models represent more accurately the original model. We are currently studying a refinement of PPA by considering three different levels of activities (inactive, active, fully active), defined by two different thresholds in order to improve the quality of the model simplification and model analysis. Such improvements could bring PPA closer to a reduction method, since the simplified models become accurate representations of the original model.

Conclusions

Mathematical models of biological systems have grown in complexity to include large numbers of processes. As a consequence, their contribution to the system dynamics becomes hardly tractable. The current manuscript contributes to this problem with the development of Principal Process Analysis. Provided the ODE model of the system is composed of a linear sum of terms describing each a process, the method enables the identification of the major processes contributing to the system dynamics and when they play a key role. Removing inactive processes allows restricting the dynamical analysis of the system to its core processes and facilitates the understanding of the system functioning. The conclusions derived with the method are robust to fluctuations in the parameter values. As such, Principal Process Analysis can be applied to any type of ODE models with the same form.

Appendix A: Estimate of errors

We give in this appendix a rough estimate of the a priori error, based on bounds in the model and simple lemmas to compare the solutions of two differential equations. We refer to [18, Chapter 3], for the basic notions.

Consider the following ODE model of biological network, as given in Eq. (2):

$$ \dot{x}=f\left(x,p\right) $$
(18)

Variable x is supposed to live in a bounded domain D of \(\mathbb {R}^{n}\), and all functions in f are supposed to be smooth enough (at least C2) and Lipshitz on D. We denote by L the Lipschitz constant of f(x,p) on D. The decomposition into processes gives:

$$ \dot{x}_{i}=\sum_{j=1}^{n_{i}}f_{ij}\left(x,p\right) \;\; i=1,\ldots,n $$
(19)

where f ij represents the jth process involved in the dynamical evolution of the ith variable of the system over a period of time [ t0,T], and n i is the number of processes for \(\dot {x}_{i}\).

The weights are computed during [ t0,T]. For the sake of simplicity, we suppose that, for each variable, the weight of the first process and only this weight is lower than threshold δ:

$$ W_{i1}(t,p)=\frac{|f_{i1}(x(t),p)|}{\sum_{j}^{n_{i}}|f_{ij}(x(t),p)|} < \delta \;\; i=1,\ldots,n. $$
(20)

It means that processes fi1,i=1,…,n are inactive during period [ t0,T], and thus eliminated from the system, giving the new simpler system (the new variable is denoted by y for simplicity):

$$ \dot{y}_{i}=\sum_{j=2}^{n_{i}}f_{ij}\left(y,p\right) \;\; i=1,\ldots,n. $$
(21)

As variables are assumed to be bounded, vector f1=(f11,…,f1n)t is such that:

$$ {}|f_{i1}(x,p)| \,=\, W_{i1}\;{\sum_{j=1}^{n_{i}}|f_{ij}(x(t),p)|} \leq \delta B_{i} \;\; \forall x \in D \;\; i\,=\,1,\ldots,n. $$
(22)

where B i is an upper bound for \(\sum _{j=1}^{n_{i}}|f_{ij}(x(t),p)|\) obtained from the variable bounds on domain D. All B i form vector B.

Therefore:

$$ ||f_{1}(x,p)|| \leq \delta ||B|| $$
(23)

If the initial conditions are the same (x(t0)=y(t0)), then Theorem 3.4 in [18], which is based on Gronwall’s Lemma, gives a bound between the two solutions x and y:

$$ ||x(t)- y(t)|| \leq \delta \; \frac{||B||}{L} \; { (e^{L(t-t_{0})}-1)} \; \forall t \in [t_{0}, T]. $$
(24)

The same proof applies when several f ij are inactive for some variables. One just need to sum the errors in Eq. (22).

This gives a rough bound between the two solutions; this bound is theoretical and conservative, and is not used in the practical a posteriori computation of the error in our work. Nevertheless, it shows that the error is roughly proportional to the threshold δ used in the weight computations.

Appendix B: Full mammalian model

Model equations

Equations listed in [17, 19].mRNAs of Per gene

$$\begin{array}{lcl} \frac{dM_{P}}{dt} & = & v_{sP} \frac{B^{n}_{N}}{K^{n}_{AP}+B^{n}_{N}}- v_{mP} \frac{M_{P}}{K_{mP}+M_{P}}-k_{dmp} M_{P} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ \dot{x_{1}} & = & f_{1,1}+f_{1,2}+f_{1,3} \\ \end{array} $$

mRNAs of Cry gene

$$\begin{array}{lcl} \frac{dM_{C}}{dt} & = & v_{sC} \frac{B^{n}_{N}}{K^{n}_{AC}+B^{n}_{N}}- v_{mC} \frac{M_{C}}{K_{mC}+M_{C}}-k_{dmc} M_{C}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ \dot{x_{2}} & = & f_{2,1}+f_{2,2}+f_{2,3} \\ \end{array} $$

mRNAs of Bmal1 gene

$$\begin{array}{lcl} \frac{dM_{B}}{dt} & = & v_{sB} \frac{K^{n}_{IB}}{K^{n}_{IB}+B^{n}_{N}}- v_{mB} \frac{M_{B}}{K_{mB}+M_{B}}-k_{dmb} M_{B}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ \dot{x_{3}} & = & f_{3,1}+f_{3,2}+f_{3,3} \\ \end{array} $$

Non-phosphorylated PER protein in the cytosol

$$\begin{array}{lcl} \frac{dP_{C}}{dt} & = & k_{sP} M_{P} - V_{1P} \frac{P_{C}}{K_{P}+P_{C}}+ V_{2P} \frac{P_{CP}}{K_{dP}+P_{CP}} \\&+& k_{4} PC_{C} -k_{3} P_{C} C_{C} - k_{dn} P_{C} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{4}} & = & f_{4,1}+f_{4,2}+f_{4,3}+f_{4,4}+f_{4,5}+f_{4,6} \\ \end{array} $$

Non-phosphorylated CRY protein in the cytosol

$$\begin{array}{lcl} \frac{dC_{C}}{dt} &=& k_{sC} M_{C} - V_{1C} \frac{C_{C}}{K_{P}+C_{C}}+ V_{2C} \frac{C_{CP}}{K_{dP}+C_{CP}}\\ &+& k_{4} PC_{C} -k_{3} P_{C} C_{C} - k_{dnc} C_{C} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{5}} & = & f_{5,1}+f_{5,2}+f_{5,3}+f_{5,4}+f_{5,5}+f_{5,6} \\ \end{array} $$

Phosphorylated PER protein in the cytosol

$${}\begin{array}{lcl} \frac{dP_{CP}}{dt} & = &V_{1P} \frac{P_{C}}{K_{P}+P_{C}}- V_{2P} \frac{P_{CP}}{K_{dP}+P_{CP}}- v_{dPC} \frac{P_{CP}}{K_{d}+P_{CP}} - k_{dn} P_{CP}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ \dot{x_{6}} & = & f_{6,1}+f_{6,2}+f_{6,3}+f_{6,4}\\ \end{array} $$

Phosphorylated CRY protein in the cytosol

$$\begin{array}{lcl} {}\frac{dC_{CP}}{dt} & = &\!\! V_{1C} \frac{C_{C}}{K_{P}+C_{C}}\,-\, V_{2C} \frac{C_{CP}}{K_{dP}+C_{CP}}\,-\, v_{dCC} \frac{C_{CP}}{K_{d}+C_{CP}} \,-\, k_{dn} C_{CP}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ \dot{x_{7}} & = & f_{7,1}+f_{7,2}+f_{7,3}+f_{7,4}\\ \end{array} $$

Non-phosphorylated PER-CRY complex in the cytosol

$$\begin{array}{lcl} \frac{dPC_{C}}{dt} & = &-V_{1PC} \frac{PC_{C}}{K_{P}+PC_{C}}+ V_{2PC} \frac{PC_{CP}}{K_{dP}+PC_{CP}}- k_{4} PC_{C}\\ & &+k_{3} P_{C} C_{C} + k_{2} PC_{N}- k_{1} PC_{C} - k_{dn} PC_{C}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{8}} & = & f_{8,1}+f_{8,2}+f_{8,3}+f_{8,4}+f_{8,5}+f_{8,6}+f_{8,7}\\ \end{array} $$

Non-phosphorylated PER-CRY complex in the nucleus

$$\begin{array}{lcl} {}\frac{dPC_{N}}{dt} & = &\!-V_{3PC} \frac{PC_{N}}{K_{P}+PC_{N}}\,+\, V_{4PC} \frac{PC_{NP}}{K_{dP}+PC_{NP}}\,-\, k_{2} PC_{N}\,+\,k_{1}\! PC_{C} \\ & &\!-k_{7} B_{N} PC_{N} + k_{8} I_{n} -k_{dn} PC_{N} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ {}\dot{x_{9}} & = & f_{9,1}+f_{9,2}+f_{9,3}+f_{9,4}+f_{9,5}+f_{9,6}+f_{9,7}\\ \end{array} $$

Phosphorylated PER-CRY complex in the cytosol

$$ {}\begin{array}{lcl} \frac{dPC_{CP}}{dt} \!\!& = &\!\!V_{1PC} \frac{PC_{C}}{K_{P}+PC_{C}}\,-\, V_{2PC}\! \frac{PC_{CP}}{K_{dP}+PC_{CP}}\,-\, v_{dPCC}\ \frac{PC_{CP}}{K_{d}+PC_{CP}}\,-\, k_{dn} PC_{CP}\\ \dot{x_{10}} \!& = &\! f_{10,1}+f_{10,2}+f_{10,3}+f_{10,4}\\ \end{array} $$

Phosphorylated PER-CRY complex in the nucleus

$$ {} \begin{array}{lcl} \frac{dPC_{NP}}{dt} \!\!& = &\!\!\!V_{3PC}\! \frac{PC_{N}}{K_{P}+PC_{N}}\,-\, V_{4PC}\! \frac{PC_{NP}}{K_{dP}+PC_{NP}}\,-\, v_{dPCN} \frac{PC_{NP}}{K_{d}\,+\,PC_{NP}}\,-\, k_{dn} PC_{NP}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{11}} \!& = &\! f_{11,1}+f_{11,2}+f_{11,3}+f_{11,4}\\ \end{array} $$

Non-phosphorylated BMAL1 protein in the cytosol

$$ {} \begin{array}{lcl} \frac{dB_{C}}{dt} & = & k_{sB} M_{B} \,-\, V_{1B} \frac{B_{C}}{K_{P}+B_{C}}\,+\, V_{2B} \frac{B_{CP}}{K_{dP}+B_{CP}} -k_{5} B_{C} +k_{6} B_{N} - k_{dn} B_{C} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{12}} & = & f_{12,1}+f_{12,2}+f_{12,3}+f_{12,4}+f_{12,5}+f_{12,6} \\ \end{array} $$

Phosphorylated BMAL1 protein in the cytosol

$${}\begin{array}{lcl} \frac{dB_{CP}}{dt} & = & V_{1B} \frac{B_{C}}{K_{P}+B_{C}}- V_{2B} \frac{B_{CP}}{K_{dP}+B_{CP}}- v_{dBC} \frac{B_{CP}}{K_{d}+B_{CP}} -k_{dn} B_{CP} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{13}} & = & f_{13,1}+f_{13,2}+f_{13,3}+f_{13,4} \\ \end{array} $$

Non-phosphorylated BMAL1 protein in the nucleus

$${}\begin{array}{lcl} \frac{dB_{N}}{dt} \!\!\!& = &\! \,-\,V_{3B} \frac{B_{N}}{K_{P}+B_{N}}\,+\,\! V_{4B} \frac{B_{NP}}{K_{dP}+B_{NP}}\,+\, k_{5} B_{C} \,-\, k_{6} B_{N} \,-\,k_{7} B_{N} PC_{N} \\ & &+k_{8} I_{N} -k_{dn} B_{N} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{14}} & = & f_{14,1}+f_{14,2}+f_{14,3}+f_{14,4}+f_{14,5}+f_{14,6}+f_{14,7} \\ \end{array} $$

Phosphorylated BMAL1 protein in the nucleus

$${}\begin{array}{lcl} \frac{dB_{NP}}{dt} & = & V_{3B} \frac{B_{N}}{K_{P}+B_{N}}\,-\, V_{4B} \frac{B_{NP}}{K_{dP}+B_{NP}} - v_{dBN} \frac{B_{NP}}{K_{d}+B_{NP}} -k_{dn} B_{NP}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{15}} & = & f_{15,1}+f_{15,2}+f_{15,3}+f_{15,4}\\ \end{array} $$

Inactive complex between PER-CRY and CLOCK-BMAL1 in the nucleus

$$\begin{array}{lcl} \frac{dI_{N}}{dt} & = & -k_{8} I_{N} + k_{7} B_{N} PC_{N} - v_{dIN} \frac{I_{N}}{K_{d} + I_{N}} - k_{dn} I_{N}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \dot{x_{16}} & = & f_{16,1}+f_{16,2}+f_{16,3}+f_{16,4}\\ \end{array} $$

Model parameters

Parameters listed in [17, p.546]: Set 1.

k1(h−1)=0.4, k2(h−1)=0.2, k3(nM−1h−1)=0.4, k4(h−1)=0.2, k5(h−1)=0.4, k6(h−1)=0.2, k7(nM−1h−1)=0.5, k8(h−1)=0.1, K AP (nM)=0.7, K AC (nM)=0.6, K IB (nM)=2.2, k dmb (h−1)=0.01, k dmc (h−1)=0.01, k dmp (h−1)=0.01, k dnc (h−1)=0.12, k dn (h−1)=0.01, K d (nM)=0.3, K dp (nM)=0.1, K p (nM)=0.1, K mB (nM)=0.4, K mC (nM)=0.4, K mP (nM)=0.31, k stot (h−1)=1.0, k sB (h−1)=0.12k stot , k sC (h−1)=1.6k stot , k sP (h−1)=0.6k stot , n=4, m=2, V phos (nMh−1)=0.4, V1B(nMh−1)=0.5, V1C(nMh−1) = 0.6, V1P(nMh−1) = V phos ,V1PC(nMh−1)=V phos , V2B(nMh−1)=0.1, V2C(nMh−1)=0.1, V2P(nMh−1)=0.3, V2PC(nMh−1)=0.1, V3B(nMh−1)=0.5, V3PC(nMh−1)=V phos , V4B(nMh−1)=0.2, V4PC(nMh−1)=0.1, v dBC (nMh−1)=0.5, v dBN (nMh−1)=0.6v dCC (nMh−1)=0.7, v dIN (nMh−1)=0.8, v dIN (nMh−1)=0.8, v dPC (nMh−1)=0.7, v dPCC (nMh−1) = 0.7,v dPCN (nMh−1) = 0.7,v mB (nMh−1) =0.8, v mC (nMh−1)=1.0, v mP (nMh−1)=1.1, vstot(nMh−1)=1.0, v sB (nMh−1)=v stot , v sB (nMh−1)=v stot , v sC (nMh−1)=1.1v stot , v sP (nMh−1)=1.5v stot

Table 4 Switching times (s.t), their values (v.) in [h] and associate reduced (cluster) switching times\(\left (t^{r}_{1},t^{r}_{2},t^{r}_{3},t^{r}_{4}\right)\) (s.t.c.): \(t_{1}^{r}\) is associated to the cluster of \(t_{1}- t_{6}, t^{r}_{2}\) to \(t_{7} - t_{15}, t^{r}_{3}\) to t16t26, and \(t^{r}_{4}\) to t27t46

Initial conditions

The unit of the initial conditions is nM.

M P (0)=2.188M C (0)=1.633,M B (0)=9.498,P C (0)=2.008,C C (0)=1.884,P CP (0)=0.129,C CP (0)=0.473, PC C (0)=1.228,PC N (0)=0.177,PC CP (0)=0.203, PC NP (0)= 0.101,B C (0) =2.523,B CP (0)=0.929,B N (0)=1.787,B NP (0)=0.318,I N (0)=0.051

Appendix C: Switching times

See Table 4.

Appendix D: Neglected processes

First reduced model

Neglected processes are: f1,3, f2,3, f3,3, f4,6, f5,3, f5,6, f6,4, f7,4, f8,2, f8,7, f9,6, f9,7, f10,4, f11,4, f12,3, f12,6, f13,2, f13,4, f14,2, f14,6, f14,7, f15,4, f16,1, f16,4.

Second reduced model: sub-models

Neglected processes in SM1 are: f1,3, f2,3, f3,3, f4,6, f5,3, f5,6, f6,4, f7,4, f8,2, f8,7, f9,6, f9,7, f10,4, f11,4, f12,3, f12,6, f13,2, f13,4, f14,2, f14,6, f14,7, f15,4, f16,1, f16,4.

In SM2, we supposed that processes switching state from t1=0.33 until t6=1.5 change simultaneously at time \(t_{1}^{r}=0.9\). Deleted processes are common to those removed in SM1, as well as: f4,3, f4,4, f5,4, f7,2, f8,3,f8,5, f9,2, f9,3, f10,2, f14,5.

In SM3, we supposed that processes switching state from t7=3.8 until t6=1.5 change simultaneously at time \(t_{2}^{r}=6\). Deleted processes are common to those removed in SM1, as well as: f1,1, f2,1, f4,2, f4,3, f5,2, f7,2, f8,1, f9,1, f9,2, f10,2, f11,2, f12,5, f14,4.

In SM4, we supposed that processes switching state from t16=9.8 until t26=15.6 change simultaneously at time \(t_{3}^{r}=12.5\). Deleted processes are common to those removed in SM1, as well as: f4,1, f5,1, f8,4, f9,2, f10,2, f11,2,f12,5, f14,4.

In SM5, we supposed that processes switching state from t27=17.3 until t46=22.9 change simultaneously at time \(t_{4}^{r}=20.0\). Deleted processes are common to those removed in SM1, as well as: f4,3, f4,4, f5,4, f7,2, f8,3, f8,5, f9,2, f9,3, f10,2, f14,5.

Appendix E: Dynamical Process Maps

See Figs. 12, 13, 14, 15 and 16.

Fig. 12
figure 12

Dynamical Process Map of SM1

Fig. 13
figure 13

Dynamical Process Map of SM2

Fig. 14
figure 14

Dynamical Process Map of SM3

Fig. 15
figure 15

Dynamical Process Map of SM4

Fig. 16
figure 16

Dynamical Process Map of SM5