Algebra, Geometry and Topology of ERK Kinetics

Marsh, Lewis; Dufresne, Emilie; Byrne, Helen M.; Harrington, Heather A.

doi:10.1007/s11538-022-01088-2

Algebra, Geometry and Topology of ERK Kinetics

Methods
Open access
Published: 23 October 2022

Volume 84, article number 137, (2022)
Cite this article

Download PDF

You have full access to this open access article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Algebra, Geometry and Topology of ERK Kinetics

Download PDF

Lewis Marsh ORCID: orcid.org/0000-0002-6740-8436^1,2,
Emilie Dufresne³,
Helen M. Byrne^1,2 &
…
Heather A. Harrington¹

3212 Accesses
4 Citations
13 Altmetric
1 Mention
Explore all metrics

Abstract

The MEK/ERK signalling pathway is involved in cell division, cell specialisation, survival and cell death (Shaul and Seger in Biochim Biophys Acta (BBA)-Mol Cell Res 1773(8):1213–1226, 2007). Here we study a polynomial dynamical system describing the dynamics of MEK/ERK proposed by Yeung et al. (Curr Biol 2019, https://doi.org/10.1016/j.cub.2019.12.052) with their experimental setup, data and known biological information. The experimental dataset is a time-course of ERK measurements in different phosphorylation states following activation of either wild-type MEK or MEK mutations associated with cancer or developmental defects. We demonstrate how methods from computational algebraic geometry, differential algebra, Bayesian statistics and computational algebraic topology can inform the model reduction, identification and parameter inference of MEK variants, respectively. Throughout, we show how this algebraic viewpoint offers a rigorous and systematic analysis of such models.

Dynamical analysis of the fission yeast cell cycle via Markov chain

Article 15 April 2021

Models in biology: lessons from modeling regulation of the eukaryotic cell cycle

Article Open access 01 July 2015

Spatio-temporal dynamics of a cell signal pathway with negative feedbacks: the MAPK/ERK pathway

Article 21 March 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In systems biology, dynamics play a crucial role in cellular decision making (e.g. whether a cell responds appropriately to a particular signal) (Voit 2017). Molecular interactions can be modelled as systems of chemical reactions with a choice of kinetics, such as the law of mass action, which assumes that the rate at which a chemical reaction proceeds is proportional to the product of the concentrations of its reactants. From a finite set of reactions, the mass-action modelling assumption gives rise to a system of polynomial ordinary differential equations (ODEs), which are sums of monomials in which each term includes concentrations of molecular species as variables and coefficients as rates of reaction. Chemical reaction network theory (CRNT) is a mathematical field developed by Horn and Jackson, and independently by Bykov, Gorban, Volpert and Yablonsky, for analysing such reactions, and the mathematical techniques employed extend beyond dynamical systems theory to include algebraic geometry, differential algebra, algebraic statistics and discrete mathematics (Dickenstein 2016).

CRNT often focuses on steady-state analysis through the lens of computational and real algebraic geometry, asking questions about the capability or preclusion of multiple positive real steady-states (i.e. multistationarity) or more complex dynamics, often without requiring specialised parameter values (Banaji and Craciun 2009; Craciun and Feinberg 2005; Millán et al. 2012; Angeli 2009; Wang and Sontag 2008; Feliu and Wiuf 2012; Müller 2016; Conradi and Pantea 2019). Multi-site protein phosphorylation systems, such as the ERK/MEK signalling pathway, can be translated into such chemical reactions and their multistationarity, corresponding to different biological cellular decisions, has attracted much attention (Thomson and Gunawardena 2009b; Gunawardena 2007; Aoki 2011; Takahashi et al. 2010; Markevich et al. 2004). Algebraic analyses and invariants of multi-site phosphorylation have revealed geometric information of steady-state varieties, informed experimental design and enabled model comparison using steady-state data (Manrai and Gunawardena 2008; Thomson and Gunawardena 2009a; Harrington et al. 2012; Gross et al. 2016; MacLean et al. 2015). However, such systems have also been shown to exhibit nontrivial transient dynamics and oscillations (Conradi et al. 2019; Qiao et al. 2007). In recent years, the fields of systems biology and CRNT have extended the repertoire of techniques to assert other dynamics (Banaji 2020; Conradi et al. 2019; Domijan and Kirkilionis 2009; Mincheva and Roussel 2007; Kay 2017; Errami 2015; Angeli et al. 2013), reduce models systematically (Pantea et al. 2014; Goeke et al. 2017; Feliu et al. 2019; Sweeney 2017; Boulier et al. 2011; Hubert and Labahn 2013), and assess identifiability (Ljung and Glad 1994; Ollivier 1990; Meshkat et al. 2009; Hong et al. 2020; Bellu et al. 2007). Furthermore, combinatorial structures, such as simplicial complexes, and techniques from computational algebraic topology have enabled comparison of chemical reaction network models and their parameters (Vittadello and Stumpf 2020; Nardini et al. 2020).

A previous algebraic systems biology case study (Gross et al. 2016) analysed a chemical reaction network model at steady state, by studying the steady-state ideal, chamber complex and algebraic matroids of the model. Here we present a sequel of such analysis to study the dynamics of chemical reaction networks with time-course data, which relies on studying the QSS variety (Sect. 3), the model prediction map (Sect. 4) and the topology of a parameter inference (Sect. 5).

We perform a detailed mathematical analysis of recently published models and experimental data (Yeung et al. 2019). The Full ERK model describes dual phosphorylation of ERK by MEK, two molecular species whose activation regulates cell division, cell specialisation, survival and cell death (Shaul and Seger 2007). The dynamics of the six ERK/MEK molecular species $x\in {{\,\mathrm{{\mathbb {R}}}\,}}^{n=6}$ in the Full ERK model are governed by a polynomial dynamical system ${\dot{x}}(t) = f(x(t),\theta )$, where $\theta \in {{\,\mathrm{{\mathbb {R}}}\,}}^{m=6}$ is the vector of parameters and there are two conservation relations between the species. The Full ERK model is presented in Sect. 2. Analysing the kinetic parameters of a model depends on the available data. The accompanying time-course experimental observations include measurements of ERK in 3 different states, at 7 time points following activation by its activated enzyme kinase MEK, which is either wild-type (WT) or mutated MEK. Mutations of MEK are known to be involved in human cancer and embryonic developmental defects; therefore, understanding their kinetics and differences between wild-type and 4 mutants (e.g. Y130C, F53S, E203K or SSDD) may increase fundamental biological understanding of the pathway and contribute to the development of potential therapies. The experimental data and relevant biological information are presented in Sect. 2.

Using algebraic approaches first presented by Goeke, Walcher and Zerz in Goeke et al. (2017), we decrease the number of variables and parameters in the Full ERK model. We derive two model reductions: the Rational ERK model and the Linear ERK model. We show, with known biological information (see Sect. 2), that the reduction to the Linear ERK model by Yeung et al. (2019) is mathematically sound. We note that the Rational ERK model was not analysed in Yeung et al. (2019), although it can be derived from the Full ERK model using singular perturbation methods. A natural question is whether a quasi-steady-state approximation is justified given the experimental setup, which equates to solving an algebraic problem (Goeke et al. 2017). We identify algebraic varieties $V_\theta $ that are (analytic) invariant sets of the ODE system and characterise neighbourhoods in parameter space for which the ODE solutions remain close to these varieties. This systematic analysis allows us to simplify the model equations such that the dynamics of both reduced models are good approximations to the Full ERK model. Algebraic model reduction and derivation of the reduced ERK models are given in Sect. 3.

Before estimating the parameters of a model from observations, one must determine its identifiability. Identifiability is concerned with asking whether it is possible to recover values of the model parameters given data. A model is structurally identifiable if parameter recovery is possible with perfect data. Mathematically, this task is equivalent to asking whether the model prediction map is injective. The model prediction map, defined precisely in Sect. 4.1, is a map that takes a parameter to the corresponding predicted noise-free data point(s) (Dufresne et al. 2018). Real data is often noisy; testing whether parameter recovery is possible with imperfect data is the problem of practical identifiability (Raue et al. 2009; Dufresne et al. 2018). Mathematically, measurement noise induces a probability distribution in data space. Assuming that the model prediction map is injective (at least generically), practical identifiability can be defined in terms of the boundedness (with respect to a reference metric in parameter space) of the confidence regions of a likelihood test. Under our assumptions, this translates to asking whether the preimages of small bounded regions in data space are bounded in parameter space. We prove the following:

Theorem 1

The Linear ERK/MEK model, with the given experimental setup (number of species, number of replicates, number of measurement time-points and initial conditions), is structurally and practically identifiable.

We provide a definition of practical identifiability that improves a previous definition (Dufresne et al. 2018), and which is an alternative to that of Raue et al. (2009). We also propose a computable algorithm for practical identifiability, implement it and apply it to the ERK models. We prove Theorem 1 in Sect. 4.

We use the differential algebra method to show that the Full ERK model and the Rational ERK model are generically structurally identifiable. These results are guaranteed to be valid if we have at least $2m+1$ generic time points by Sontag’s result (Sontag 2002), but can be valid with fewer generic time points. Indeed, as the Linear ERK model admits analytic solutions, we can prove that it is globally structurally identifiable for any choice of three distinct time points. Determining structural identifiability for specific time points in the absence of analytic solutions is an open problem.

We numerically show that the Full ERK model and Rational ERK model are not practically identifiable; however, the source of this practical non-identifiability is not completely clear (see Sect. 4).

Finally, for a model that is structurally and practically identifiable, one would like to infer parameters, i.e. what parameter values are consistent with the observations? We perform Bayesian inference, as done in Yeung et al. (2019), and extend this to the Rational ERK model. The result of the parameter inference on the Linear ERK model is a sample point cloud of posterior densities of inferred ERK parameter kinetics that are consistent with the data. We obtain five different sample densities corresponding to the five MEK variants.

In Sect. 5, we compare the geometry of the admissible regions of parameter space of the five MEK variants. The computational field of topological data analysis quantifies the shape and connectivity of data through computation of topological invariants across resolutions (or threshold values) of metric data. In recent years, topological methods have dramatically improved in computational speed as well as theoretical advancements that facilitate the analysis of scientific datasets. We implement a theoretical framework originally proposed by Taylor (2019a), in order to quantify the shape of the resulting posterior distributions of kinetic parameters and facilitate a comparison between mutants. Specifically, their theorem provides a direction for hypothesis testing of two densities using distances between topological summaries. The framework relies on approximating the persistent homology of super-level sets of posterior densities by simplicial complexes. We perform these measurements on the distributions obtained from Bayesian parameter inference for the 5 MEK variants and compare them via a topological bottleneck distance.

Biological Result

The topological data analysis quantifies that the Linear ERK model parameter posteriors are most different between the WT and SSDD mutant data. The kinetics of the SSDD mutant, which mimics phosphorylated MEK, has the largest topological distance from all other MEK/ERK mutants.

This biological result raises the question of whether the SSDD variant is a suitable approximation for wild-type MEK activated by Raf, and suggests further experimental studies are needed. While the previous analysis by Yeung et al. (2019) compared the variants by the inferred kinetics of each parameter, here we complement that analysis by comparing the three parameters together as a point cloud.

Our aim is to showcase how systematic algebraic, geometric and topological approaches can be applied to a biologically relevant model with state-of-the-art experimental time-course data. Each of these approaches incorporates the structure of the mathematical model, experimental observations, and experimental setup and observations (e.g. experimental initial condition, observable species, number of experimental replicates, number of time points collected, etc.), as well as known biological information (e.g. published parameter values). Due to the multiple disciplines and different notation conventions (as well as standard abbreviations), we include a glossary of symbols at the start of the paper. The framework we present is not limited to this case study and may enhance the analysis of similar models in systems and synthetic biology.

2 From ERK Biochemical Reactions to a Polynomial Dynamical System

Protein phosphorylation alters protein function in signalling pathways and plays a crucial role in cellular decisions and homeostasis. Phosphorylation is the addition of a phosphate group by an enzyme known as a kinase, and dephosphorylation is the removal of a phosphate group by an enzyme known as a phosphatase. Multisite phosphorylation is the process of having multiple possible locations on a protein phosphorylated, which increases the number of potential ways protein function can be altered. The algebra, geometry, combinatorics and dynamics of multisite phosphorylation has been a source of interesting mathematical problems (Dickenstein 2016; Manrai and Gunawardena 2008; Conradi and Pantea 2019). A protein with q phosphorylation sites has been shown to have $2^q$ phospho-states; the sites on the protein can be phosphorylated in q! possible ways (Thomson and Gunawardena 2009b). One of the simplest multisite phosphorylation systems is when a protein has two phosphorylation sites. We focus on the sequential dual phosphorylation of the extracellular signal regulated kinase (ERK) by its kinase activated (dually phosphorylated) MEK. The model developed by Yeung et al. (2019) encodes a mixed phosphorylation mechanism (i.e. distributive and processive) by changes in parameter values rather than separate models (see, for example, Conradi and Shiu 2015; Gunawardena 2007 and references therein). This enabled them to quantify the extent to which a MEK variant is processive or distributive. We remark that the model presented by Yeung et al. does not include dephosphorylation mechanisms, since the experimental setup omitted the addition of phosphatases.

Next, we introduce the model and the experimental data published by Yeung et al. (2019).

2.1 The Model

The protein substrate ERK, is activated through dual phosphorylation by its activated enzyme kinase MEK. As shown in the chemical reaction network (see Fig. 1), unphosphorylated ERK ($S_0$) binds reversibly with its kinase MEK (E) to form an intermediate complex $C_1$. The complex becomes $C_2$ when a phosphate group is added. Complex $C_2$ can then disassociate to form MEK (E) and ERK phosphorylated on the tyrosine site ($S_1$), or a second phosphate group is added to $C_2$, resulting in product reactants $E+S_2$. The six species and six rate constants are given in the following chemical reaction network (Fig. 1).

We can translate this reaction network into a dynamical system ${\dot{x}} = f(x,\theta )$. Here, f is a vector-valued function of the vectors of species concentrations $x= \{S_0, C_1, C_2, S_1, S_2,E\}$ and rate constants, referred to as parameters $\theta = \{k_{f_1},k_{r_1},k_{c_1},k_{f_2}, k_{r_2},k_{c_2} \}$. The kinetics assumption for f is a modelling choice; here we assume that the law of mass action holds (Klipp et al. 2016, §2.1.1), as for the original model (Yeung et al. 2019). The resulting dynamical system of ODEs is given in Eqs. (1).

$$\begin{aligned} \frac{\mathrm dS_0}{\mathrm dt}&= -k_{f_1}E\cdot S_0+k_{r_1}C_1, \end{aligned}$$

(1a)

$$\begin{aligned} \frac{\mathrm dC_1}{\mathrm dt}&= k_{f_1}E\cdot S_0 -(k_{r_1}+k_{c_1})C_1, \end{aligned}$$

(1b)

$$\begin{aligned} \frac{\mathrm dC_2}{\mathrm dt}&= k_{c_1}C_1-(k_{r_2}+k_{c_2})C_2 + k_{f_2}E\cdot S_1, \end{aligned}$$

(1c)

$$\begin{aligned} \frac{\mathrm dS_1}{\mathrm dt}&= -k_{f_2}E\cdot S_1+k_{r_2}C_2, \end{aligned}$$

(1d)

$$\begin{aligned} \frac{\mathrm dS_2}{\mathrm dt}&=k_{c_2}C_2, \end{aligned}$$

(1e)

$$\begin{aligned} \frac{\mathrm dE}{\mathrm dt}&= -k_{f_1}E\cdot S_0+k_{r_1}C_1-k_{f_2}E\cdot S_1 +(k_{r_2}+k_{c_2})C_2. \end{aligned}$$

(1f)

We assume that initially all species are zero, except for $S_0(t=0)=S_{tot}$ and $E(t=0)=E_{tot}$. Equations (2)–(3) define two conserved quantities that constitute a basis for the linear space of conservation relations of the model:

$$\begin{aligned} S_{tot}&=S_0+S_1+S_2+C_1+C_2, \end{aligned}$$

(2)

$$\begin{aligned} E_{tot}&=E+C_1+C_2, \end{aligned}$$

(3)

where the total amounts of substrate ERK ($S_{tot}$) and enzyme MEK ($E_{tot}$) are constant and known from the initial conditions.

We aim to study the relationship between the species x, parameters $\theta $, conserved quantities and available biological information (previous knowledge and experimental observations). The emphasis in this paper is not to analyse the steady-state variety as in Gross et al. (2016), rather here we focus on the transient dynamics of the model and algebraic approaches to analyse ERK kinetics in light of the available biological information.

2.2 The Data

The data is published. Details on measurement techniques and experimental methods can be found in Yeung et al. (2019). We present the experimental setup for the data we analyse.

2.2.1 Experimental Setup and Data

In all experiments, $0.65\mu M$ free (activated) enzyme MEK (E) was added to $5\mu M$ of unphosphorylated ERK substrate $(S_0)$ along with ATP; therefore, $S_{tot}=5\mu M$ and $E_{tot}=0.65\mu M$. ERK were measured in three states: unphosphorylated $(S_0+C_1)$, mono-phosphorylated $(S_1+C_2)$ and dually phosphorylated ERK $(S_2)$, at 7 time points, with r different experimental replicates. The sample space for each MEK variant is $X={{\,\mathrm{{\mathbb {R}}}\,}}^{3\times 7\times r}$, where for human wild-type (WT) MEK, $r=11$; for MEK variants with phosphomimetic (SSDD), $r=6$; and for activating mutations, $r=5$. The three activating mutants of MEK are known to be involved in human cancer (E203K) or developmental abnormalities (F53S and Y130C). The ERK observations were collected at seven time points $t=\{0.5, 2, 3.25, 3.75, 5, 10, 20\}$ minutes for all MEK variants except SSDD, which were collected at $t= \{1, 2, 3.25, 5, 10, 20, 40\}$ minutes.

2.2.2 Known Biological Information

The relationship between some kinetic rate constants is known. When a substrate binds reversibly to an enzyme to form an enzyme-substrate complex, which then reacts irreversibly to form a product and release the enzyme, one can define the Michaelis–Menten constant $k_{M}$. In the reaction network given by Eqs. (1), there are two Michaelis-Menten constants $k_{M_i}=(k_{c_i}+k_{r_i})/k_{f_i}$ for $i=1,2$. Measurements show that in our experimental setup $k_{M_i}\approx 25\mu M$ for $i=1,2$ (Taylor 2019b). While the reaction rates $k_{c_i}$ and $k_{r_i}$ for $i=1,2$ cannot be measured directly, they have been shown to be of the same order of magnitude (Bar-Even 2011). We will use these insights to assume, henceforth, that $S_0$, $S_1$ and $S_2$ were measured (without added compound variables). We justify this mathematically in Sect. 3.2.4.

3 Algebraic Model Reduction

The first step to studying most models typically involves model reduction, which reduces the number of dependent variables and constant parameters. For many chemical reactions, there are time scales on which the rate of change of some variables is negligible and their dynamics is dominated by those of the remaining variables. This observation motivates the Quasi-Steady-State-Approximation (QSSA).

In recent years, algebraic approaches to reduce polynomial ODE models have been extended by Walcher and colleagues. In 2013, Pantea et al. (2014) used Galois theory to characterise chemical reaction networks for which no explicit QSSA reduction is possible. Furthermore, they provided computational tools for determining the feasibility of an explicit reduction. Subsequently, Sweeney (2017) proved that the nonsolvability of polynomials poses no issue to the CRNs most commonly encountered in practice and derived a more efficient algorithm for determining explicit reducibility by translating algebraic structures into graphs. Goeke and Walcher (2014) provide an explicit formula for obtaining a reduced QSSA model using a subset of an algebraic variety defined by the slow manifold. Subsequently, Goeke et al. (2017) characterised parameter values at which QSSA reduction is accurate using algebraic varieties and bounds on the polynomials governing the ODE system on a bounded parameter- and variable-domain. Most recently, Feliu et al. (2019) derived necessary and sufficient conditions for purely algebraic reductions of a CRN model to agree with model reductions derived via classical singular-perturbation theory (Keener and Sneyd 2011; Segel 1988).

In this section, we briefly review QSSA using classical singular-perturbation theory as well as the algebraic approaches developed by Goeke et al. (2017); Goeke and Walcher (2014). We then apply both methods to the full ERK model (Eqs. (1)). We show both approaches can generate the same QSSA-reduction of our ERK model, which we will call the Rational ERK model. Additionally, the algebraic method can yield a linear QSSA-reduction of our ERK model in a single step (which we call the Linear ERK model). By contrast, the singular-perturbation-theory approach requires additional assumptions on parameter values to arrive at the Linear ERK model (see Sect. 3.2). We show that the Linear ERK model approximates the Full ERK model (Eqs. (1)) with similar accuracy as the Rational ERK model in the context of the experimental setup, data and known biological information (see Sect. 3.2; Appendix A.2 for details).

With the algebraic method, we provide a rigorous mathematical justification of the Linear ERK model presented by Yeung et al. (2019). By comparing the singular perturbation method with the algebraic method and the two resulting model reductions, we illustrate how the algebraic methods form a well-structured approach for arriving at a QSS reduction and for assessing the accuracy of such reductions systematically.

3.1 Notation for Model Reduction

Throughout, we will assume we have an ODE system in variables $x\in {{\,\mathrm{{\mathbb {R}}}\,}}^n$ and parameters $\theta \in {{\,\mathrm{{\mathbb {R}}}\,}}^m$. If the system dynamics are governed by f, a vector of polynomials in ${{\,\mathrm{{\mathbb {R}}}\,}}[x,\theta ]^n$, then our ODE system is given by

$$\begin{aligned} \frac{\textrm{d} x}{\textrm{d}t} = f(x,\theta ). \end{aligned}$$

(4)

For $1\le q<n$, we may define

$$\begin{aligned}&x^{[1]}=(x_1,\ldots ,x_q),{} & {} \quad f^{[1]}=(f_1,\ldots ,f_q),&\\&x^{[2]}=(x_{q+1},\ldots ,x_n),{} & {} \quad f^{[2]}=(f_{q+1},\ldots ,f_n).&\end{aligned}$$

We wish to retain the variables $x^{[1]}$ in the reduced model and seek to eliminate variables $x^{[2]}$ as part of our model reduction.

For the full ERK model (Eqs. (1)), we choose $x^{[1]}:=(S_0, S_1, S_2)$ and $x^{[2]}:=(C_1, C_2)$. Analogously, $f^{[1]}$ are the polynomials governing the rates of change of $S_0$, $S_1$ and $S_2$ (Eqs. (1a), (1d) and (1e)) and $f^{[2]}$ are the polynomials governing the rates of change of $C_1$ and $C_2$ (Eqs. (1b) and (1c)).

Remark

In the current section, we treat the (non-zero) initial conditions of the ODE systems as parameters (and include them in the parameter count m), as they are central to determining the goodness of a model reduction. In Sect. 4 (Identifiability) and Sect. 5 (Inference & Comparison), we will not include the initial conditions in the set of parameters, as they are given by the experimental setup and, as such, do not need to be identified or inferred.

3.2 The Algebraic QSSA Approach

The algebraic approach to QSSA, as presented by Goeke, Walcher and Zerz in Goeke et al. (2017), differs from the classical approach in several ways. Most notably, an a priori separation of time scales is not needed. On the other hand, we require a choice of fast and slow variables (i.e., a choice of which variables we eliminate, and which we retain in the reduced model).

Remark

To the best of our knowledge, all existing algebraic approaches to QSSA, including (Goeke et al. 2011; Goeke and Walcher 2014; Boulier et al. 2011; Goeke et al. 2017), require a choice, explicit or implicit, of slow and fast variables. In Goeke et al. (2011) the relevant choice is made by expanding f (in Goeke et al. (2011): h) at different orders of $\varepsilon $.

First, we characterise points in parameter space, i.e. parameter values, where the fast variables are exactly determined by the slow variables, which yields a reduced model. This set of parameter values is defined as the vanishing set of the polynomials governing the ODEs of the fast variables. This defines an algebraic variety in the parameter space. Typically, the ODE system will be degenerate at these values. Secondly, we characterise neighbourhoods of these values in parameter space, as well as time scales, for which the reduction is a good approximation to the original model.

To describe the characterisation from Goeke et al. (2017), we use $x^{[1]}$, $x^{[2]}$, $f^{[1]}$ and $f^{[2]}$ as before. In addition, we denote the partial derivative with respect to $x^{[i]}$ by $D_i$. For a fixed $\theta ^*\in {{\,\mathrm{{\mathbb {R}}}\,}}^m$, we let $Y_{\theta ^*}$ denote the algebraic variety defined by $f^{[2]}(\,\cdot \,,\theta ^*)$.

Definition 2

Let $y\in Y_{\theta ^*}$ be such that the $(n-q)\times (n-q)$ matrix $D_2 f^{[2]}$ has full rank at $(y,\theta ^*)$. Then we denote by $V_{\theta ^*}\subseteq Y_{\theta ^*}$ a relatively Zariski-open neighbourhood of y in which this rank is maximal. We call $V_{\theta ^*}$ a quasi-steady-state (QSS) variety in the sense of Goeke et al. (2017) and may assume without loss of generality that it is irreducible.

If, furthermore, $V_{\theta ^*}$ is an invariant set of the ODE system ${\mathrm dx^{[1]}/\mathrm dt}=f(x,\theta ^*)$, then we call $\theta ^*$ a QSS parameter value. Recall that in dynamical systems theory, $V_{\theta ^*}$ is an invariant set of ${{\,\mathrm{{\mathbb {R}}}\,}}^q$ if whenever the initial condition of an ODE at $t=0$ is in $V_{\theta ^*}$, then the corresponding trajectories of the ODE remain in $V_{\theta ^*}$ for all $t>0$.

Remark

Note that the steady-state variety (see Gross et al. 2016) and the QSS variety at a parameter value $\theta ^*$ are not as closely related as one may first think. Indeed with our notation, the steady state variety is the zero set in ${{\,\mathrm{{\mathbb {R}}}\,}}^n\times {{\,\mathrm{{\mathbb {R}}}\,}}^m$ of the ideal $\langle f^{[1]}(x,\theta ),f^{[2]}(x,\theta )\rangle $ of ${{\,\mathrm{{\mathbb {R}}}\,}}[x,\theta ]$, while the QSS variety at $\theta ^*$ is contained in the zero set in ${{\,\mathrm{{\mathbb {R}}}\,}}^n\times \{\theta ^*\}$ of the ideal $\langle f^{[2]}(x,\theta ^*)\rangle $ of ${{\,\mathrm{{\mathbb {R}}}\,}}[x]$. That is, we have both ${\mathcal {V}}_{{{\,\mathrm{{\mathbb {R}}}\,}}^n\times {{\,\mathrm{{\mathbb {R}}}\,}}^m}(f^{[1]}(x,\theta ),f^{[2]}(x,\theta ))\subset {\mathcal {V}}_{{{\,\mathrm{{\mathbb {R}}}\,}}^n\times {{\,\mathrm{{\mathbb {R}}}\,}}^m}(f^{[2]}(x,\theta ))$ and $V_{\theta ^*}\subseteq Y_{\theta ^*}={\mathcal {V}}_{{{\,\mathrm{{\mathbb {R}}}\,}}^n\times \{\theta ^*\}}(f^{[2]}(x,\theta ^*))\subset {\mathcal {V}}_{{{\,\mathrm{{\mathbb {R}}}\,}}^n\times {{\,\mathrm{{\mathbb {R}}}\,}}^m}(f^{[2]}(x,\theta ))$, but the steady-state variety and $V_{\theta ^*}$ are not contained in one another in general.

To apply the theory of Goeke, Walcher and Zerz in Goeke et al. (2017), we assume that the initial condition of our ODE system (Eq. (4)) lies in $V_{\theta ^*}$. As $D_2f^{[2]}$ has full rank on $V_{\theta ^*}$, we have that $x^{[2]}=\Psi \left( x^{[1]}\right) $ for some continuous $\Psi $ by the Implicit Function Theorem. Hence, writing $x=(x^{[1]}, x^{[2]})$, we obtain a reduced model:

$$\begin{aligned} \frac{\mathrm dx^{[1]}}{\mathrm dt}=f^{[1]}\left( \left( x^{[1]}, \Psi \left( x^{[1]}\right) \right) , {\theta ^*}\right) \end{aligned}$$

(5)

on some open neighbourhood in ${{\,\mathrm{{\mathbb {R}}}\,}}^j$ that naturally includes $V_{\theta ^*}$. This corresponds to determining the fast variables in terms of the slow variables. We do this by setting their time rates of change equal to zero on the short timescale in classical QSSA, with the addition that on $V_{\theta ^*}$ the above yields an exact solution rather than an approximation. As a caveat, we note that, in both settings, it may not be possible to find an algebraic expression for $\Psi $; this was pointed out and completely characterised by Pantea et al. (2014) in terms of Galois theory. Because of the possible non-solvability issue with Eq. (5), we require a more general methodology (Proposition 3) to study the accuracy of a model reduction (Proposition 4).

Goeke, Walcher and Zerz showed that locally, in the variable $x^{[1]}$, the reduced system given by Eq. (5) has the same solution as the following ODE system

$$\begin{aligned} \frac{\mathrm dx^{[1]}}{\mathrm dt}=f^{[1]}\left( x, {\theta ^*}\right) ,\qquad \frac{\mathrm dx^{[2]}}{\mathrm dt}=-D_2 f^{[2]}(x, {\theta ^*})^{-1}D_1f^{[2]}(x, {\theta ^*})f^{[1]}(x, {\theta ^*}).\nonumber \\ \end{aligned}$$

(6)

Proposition 3

(Lemma 1 & Proposition 1 in Goeke et al. (2017)) Let $V_{\theta ^*}$ be a QSS-variety. Then $V_{\theta ^*}$ is an invariant set of Eq. (5). Moreover, any solution of Eq. (6) with initial condition in $V_{\theta ^*}$ locally solves Eq. (5). Conversely, any solution of Eq. (5) with initial condition in $V_{\theta ^*}$ locally solves Eq. (6). In addition, $V_{\theta ^*}$ is an invariant set of Eq. (4) if and only if the solutions of Eqs. (4) and (6) are equal for all initial conditions in $V_{\theta ^*}$.

This proposition equips us with a method to obtain a solution for $x^{[1]}$ in an algebraic QSSA without explicitly determining $\Psi $. In Sects. 4 and 5, we will use Eq. (5) as our model reduction.

First, however, we assess the accuracy of Eq. (6) as an approximation to the full system, for parameter-values $\theta $ in some neighbourhood of $\theta ^*$. For convenience, we abbreviate system (6) as ${\mathrm dx/\mathrm dt}=f_\textrm{red}(x,\theta ^*)$.

Proposition 4

(Outline of Proposition 2 in Goeke et al. (2017)) Let $K^*\subseteq {{\,\mathrm{{\mathbb {R}}}\,}}_+^n\times {{\,\mathrm{{\mathbb {R}}}\,}}_+^m$ be a compact domain in the product of the variable and parameter spaces which satisfies a number of conditions (we refer the interested reader to Appendix A.2 for details). Let ${\theta ^*}$ be given such that $V_{\theta ^*}\times \{{\theta ^*}\}$ has non-empty intersection with $\textrm{int}\,K^*$, let $(y,{\theta ^*})$ be a point in this intersection, and let $V'_{\theta ^*}$ be an open neighbourhood of y such that $(V_{\theta ^*}\cap V'_{\theta ^*})\times \{{\theta ^*}\}\subseteq K^*$. Additionally, let $t^*>0$ be such that the solution of Eq. (4), with initial condition y, remains in $V'_{\theta ^*}$ for $t\in [0,t^*]$.

Then there exists a compact neighbourhood $A_{\theta ^*}\subseteq V_{\theta ^*}$ of y such that:

(i):: For every $z\in A_{\theta ^*}$, the solution of Eq. (4) with initial condition z exists and remains in $V'_{\theta ^*}$ for $t\in [0, t^*]$.
(ii):: For every $\varepsilon '>0$, there exists a $\delta _1>0$ such that for every $z\in V'_{\theta ^*}\cap A_{\theta ^*}$ the solution of Eq. (6), with initial condition z, exists and remains in $V'_{\theta ^*}$ for $t\in [0,t^*]$ whenever $\Vert f-f_\textrm{red}\Vert <\delta _1$ on $V_{\theta ^*}$.
(iii):: For every $\varepsilon '>0$, there exists a $\delta \in (0,\delta _1]$ such that, for any $z\in V_{\theta ^*}\cap A_{\theta ^*}$, the difference between the solutions of Eqs. (6) and (4), with initial condition z, is at most $\varepsilon '$ for $t\in [0,t^*]$ whenever $\Vert f-f_\textrm{red}\Vert <\delta $ on $V'_{\theta ^*}$. Here, $\Vert \,\cdot \,\Vert $ denotes the infinity-norm over the interval $[0, t^*]$ for a fixed parameter value.

In summary, given some technical assumptions on the variables and the domain $K^*$, we can bound the difference between the solutions of Eqs. (4) and (6) in terms of $\Vert f-f_\textrm{red}\Vert $ up to some time $t^*>0$. The full statement of this proposition also includes lower bounds on this difference. Note that we do not assume that $\theta ^*$ is a QSS-parameter value, but the assumptions on $K^*$ (as detailed in Appendix A.2) require it to be close to some QSS-parameter value.

3.3 Reducing the ERK Model Algebraically

We now apply the theory from Sect. 3.1 to the Full ERK model (Eqs. (1)) in two different ways to derive two reduced models (the Linear ERK and Rational ERK models). The full details of the derivations can be found in Appendix A.1. We also give a brief biological explanation of why both systems explain the phenomena underlying the given experimental data equally well.

3.3.1 Reduction via Conservation Laws

We can exploit the conservation laws (2) and (3) to eliminate a variable before using the analytic or algebraic QSSA approach. First, we choose to eliminate E and note that there are two choices:

$$\begin{aligned} E=E_{tot}-C_1-C_2 \end{aligned}$$

(7)

or

$$\begin{aligned} E=E_{tot}-S_{tot}+S_0+S_1+S_2. \end{aligned}$$

(8)

Subsequently, we choose to eliminate the variables $C_1$ and $C_2$ via (algebraic) QSSA. For the Rational ERK model, using (7) to eliminate E, we obtain

$$\begin{aligned} f_\textrm{rat}^{[2]}=\begin{pmatrix}k_{f_1}(E_{tot}-C_1-C_2)\cdot S_0 -(k_{r_1}+k_{c_1})C_1\\ k_{c_1}C_1 + k_{f_2}(E_{tot}-C_1-C_2)\cdot S_1 -(k_{r_2}+k_{c_2})C_2\end{pmatrix}, \end{aligned}$$

while for the Linear ERK model, employing substitution (8), we have

$$\begin{aligned} f^{[2]}_\textrm{lin}=\begin{pmatrix}k_{f_1}(E_{tot}-S_{tot}+S_0+S_1+S_2)\cdot S_0 -(k_{r_1}+k_{c_1})C_1\\ k_{c_1}C_1 + k_{f_2}(E_{tot}-S_{tot}+S_0+S_1+S_2)\cdot S_1 -(k_{r_2}+k_{c_2})C_2\end{pmatrix}. \end{aligned}$$

3.3.2 Reduction via an Algebraic QSSA

To reduce the model further, we apply an algebraic QSSA, as described in Sect. 3.1. We start by identifying QSS-parameter-values. For $f^{[2]}_\textrm{rat}$, we have

$$\begin{aligned} D_2f^{[2]}_\textrm{rat}=\begin{bmatrix}-k_{f_1}S_0-(k_{r_1}+k_{c_1}) &{} -k_{f_1}S_0 \\ -k_{f_2}S_1+k_{c_1} &{} -k_{f_2}S_1-(k_{r_2}+k_{c_2})\end{bmatrix}, \end{aligned}$$

while for $f^{[2]}_\textrm{lin}$ we have

$$\begin{aligned} D_2f^{[2]}_\textrm{lin}=\begin{bmatrix}-(k_{r_1}+k_{c_1}) &{} 0 \\ k_{c_1} &{} -(k_{r_2}+k_{c_2})\end{bmatrix}. \end{aligned}$$

In both cases, assuming that $(k_{r_i}+k_{c_i})>0$ for $i=1,2$ (otherwise, the reaction network would be degenerate, meaning some or all variables would remain constant), and given that $S_0$ and $S_1$ are non-constant, we deduce that these matrices are invertible. Hence, both substitutions (7) and (8) are good candidates for an algebraic QSSA reduction.

We note that the assumption $E_{tot}=0$ is required to ensure that the initial condition lies in $V_{\theta ^*}$. This is not physically realistic, as the absence of free enzyme makes the reaction rates negligible, however, in parameter space this assumption is close to the experimental setup ($E_{tot}\approx 0.65\mu M$). In fact, unlike the rate parameters, we know the value of $E_{tot}$ and can, therefore, bound the error associated with such an idealisation (cf. Appendix A.2). The assumption that $E_{tot}=0$ is similar to the classical singular-perturbation theory approach, where a typical choice of short timescale is $(t_{S}=E_{tot}k_{f_1})$ and one then subsequently assumes $\varepsilon = E_{tot}/S_{tot}\rightarrow 0$.

As $E_{tot}=0$ will yield a stationary model and ensure that $V_{\theta ^*}$ contains the initial condition, we find that any parameter value $\theta ^*$ satisfying $(k_{r_i}+k_{c_i})>0$ for $i=1,2$ and $E_{tot}=0$ is a QSS-parameter-value for both the Rational and Linear ERK model.

For both models, we have

$$\begin{aligned} Y_{\theta ^*}=\left\{ x=(S_0,S_1,S_2,C_1,C_2)\in {{\,\mathrm{{\mathbb {R}}}\,}}^5\,\vert \,f^{[2]}(x,\theta ^*)=0\right\} . \end{aligned}$$

For the Linear ERK model, we can show that $Y_{\theta ^*}^\textrm{lin}$ is irreducible (at generic parameter values) and thus its QSS-variety is $V_{\theta ^*}^\textrm{lin}=Y_{\theta ^*}^\textrm{lin}$. For the Rational ERK model, we have that $Y_{\theta ^*}^\textrm{rat}$ decomposes as

$$\begin{aligned} Y_{\theta ^*}^\textrm{rat}=(Y_{\theta ^*}^\textrm{rat}\cap {\mathcal {V}}(\langle C_1+C_2\rangle ))\cup (Y_{\theta ^*}^\textrm{rat}\cap {\mathcal {V}}(\langle \lambda (k_{r_2}+k_{c_2})+S_0+\lambda k_{f_2}S_1\rangle )) \end{aligned}$$

where $\lambda :=-k_{r_1}/(k_{f_1}(k_{c_1}-k_{c_2}-k_{r_1}))$. At generic parameter values, only the first irreducible component will contain the initial condition. Hence, the natural choice for the QSS-variety is

$$\begin{aligned} V^\textrm{rat}_{\theta ^*}=\left\{ x=(S_0,S_1,S_2,C_1,C_2)\in {{\,\mathrm{{\mathbb {R}}}\,}}^5\,\vert \,C_1=0,\,C_2=0\right\} . \end{aligned}$$

The substitution (7) yields the Rational ERK model given by

$$\begin{aligned} \frac{\mathrm dS_0}{\mathrm dt}&= \frac{-\kappa _1S_0}{ \gamma _1 S_0+\gamma _2S_1+1}, \end{aligned}$$

(9a)

$$\begin{aligned} \frac{\mathrm dS_1}{\mathrm dt}&=\frac{-\kappa _2S_1+(1-\pi )\kappa _1S_0}{ \gamma _1S_0+\gamma _2S_1+1}, \end{aligned}$$

(9b)

$$\begin{aligned} \frac{\mathrm dS_2}{\mathrm dt}&=\frac{\pi \kappa _1S_0 + \kappa _2S_1}{ \gamma _1S_0+\gamma _2S_1+1}, \end{aligned}$$

(9c)

while the substitution (8) gives the Linear ERK model:

$$\begin{aligned} \frac{\mathrm dS_0}{\mathrm dt}&= {-\kappa _1S_0}, \end{aligned}$$

(10a)

$$\begin{aligned} \frac{\mathrm dS_1}{\mathrm dt}&={-\kappa _2S_1+(1-\pi )\kappa _1S_0}, \end{aligned}$$

(10b)

$$\begin{aligned} \frac{\mathrm dS_2}{\mathrm dt}&= \pi \kappa _1S_0 + \kappa _2S_1. \end{aligned}$$

(10c)

Here, for $i=1,2$, we use the newly introduced quantities

$$\begin{aligned} \kappa _i=E_{tot}\frac{k_{f_i}k_{c_i}}{ k_{c_i}+k_{r_i}},\qquad \pi =\frac{k_{c_2}}{ k_{c_2}+k_{r_2}},\qquad \gamma _i=k_{f_i}\frac{k_{c_1}+k_{c_2}}{\left( k_{c_1}+k_{r_1}\right) \left( k_{c_2}+k_{r_2}\right) }. \end{aligned}$$

(11)

Both models are reductions obtained via the ODE system (5). The processivity parameter, which is the probability that both phosphorylations are carried out by the same enzyme, is represented by $\pi $ in the reduced models. The $\kappa _i$ represents the kinetic efficiencies of the first and second phosphorylation steps, respectively (Yeung et al. 2019).

It should be noted that the Rational ERK model is the system we would obtain via the classical singular perturbation approach (Keener and Sneyd 2011).

3.3.3 Assessing Accuracy

We can use the algebraic framework of Goeke, Walcher and Zerz and, in particular, Proposition 4 to bound the error of the Linear ERK model reduction to the full model. Given the measurements of the Michaelis-Menten constants $k_{M_i}$, we can derive simple expressions which bound the approximation error (see Appendix A.2 for both the Rational & Linear ERK model). Unfortunately, the bound on the approximation error depends on parameters with unknown values. However, we can compare the bounds derived for the Linear ERK model to those for the Rational ERK model and show that in the regime where $k_{M_i}\approx 25\mu \hbox {M}$, both approximate the full model equally well (see Appendix A.2).

Recall that we can also derive the Rational ERK model via singular perturbation theory. When using perturbation theory, it is uncommon to bound the approximation error as explicitly as we do via the algebraic methods of Goeke et al. (2017). However, we can still show that the Linear ERK model is a good approximation of the Rational ERK model when $0\le \gamma _1,\gamma _2\ll 1$. Again, we can use knowledge of the Michaelis-Menten constant to show that in our experimental setup, $\gamma _1$ and $\gamma _2$ are small. Indeed, we can rewrite

$$\begin{aligned} \gamma _1=\frac{1}{k_{M_1}}\frac{k_{c_1}+k_{c_2}}{ k_{c_2}+k_{r_2}}, \qquad \gamma _2=\frac{1}{ k_{M_2}}\frac{k_{c_1}+k_{c_2}}{ k_{c_1}+k_{r_1}}. \end{aligned}$$

Since $k_{M_i}\approx 25\mu M$ and the parameters $k_{c_i}$ and $k_{r_i}$ are of similar magnitude (see Bar-Even 2011), we conclude that $\gamma _1\approx 1/25\;(1/\mu M)$.

We reiterate that by employing an algebraic approach, we can derive a reduced model (without taking further limits) that approximates the Full ERK model as well as that obtained via singular perturbation theory, but has several advantages: it has fewer parameters, is interpretable as a chemical reaction network, and identifiable, as discussed in the next section.

3.3.4 Choice of Output Variables

Recall from Sect. 2.2, the experimental measurements correspond to the following linear combinations of variables: $S_0+C_1$, $S_1+C_2$ and $S_2$. Here we argue that in the context of available data, $S_0$, $S_1$ and $S_2$ are sufficient approximations of the output variables, which simplifies both the identifiability analysis and the parameter inference.

We argued in Sect. 3.2.3 that in the context of experimental data the Linear ERK model is as good of an approximation to the Full ERK model as the Rational ERK model. On the long timescale, substitutions for $C_1$ and $C_2$ from the Linear ERK model give approximately

$$\begin{aligned} C_1&=\frac{1}{k_{M_1}}E_{tot}\cdot S_0, \\ C_2&=\frac{1}{k_{M_2}}E_{tot}\cdot S_1 + \frac{k_{c_1} }{ k_{c_2}+k_{r_1}}\frac{1}{k_{M_2}}E_{tot}\cdot S_0. \end{aligned}$$

Recall that $k_{M_i}\approx 25\mu M$ and $E_{tot}=0.65\mu M$. We then find that the measurements of $S_i + C_{i+1}$ will be dominated by $S_i$. Henceforth we will use $S_i$ interchangeably with our measurements $S_i+C_{i+1}$.

4 Identifiability

One of the goals of this ERK study is to determine the kinetic parameters of the models given the data. Each model and experimental setup induces a map from the space of model parameters to observable model solutions (here, this is the measurement of the 3 species at the 7 time points over the course of r experimental replicates, i.e. a subset of ${{\,\mathrm{{\mathbb {R}}}\,}}^{21r}$). We call this map $\phi _{t_1,\ldots ,t_7}:\Theta \rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}^{21r}$ the model prediction map (see Dufresne et al. 2018). Here, the parameter space $\Theta $ is a subset of the positive octant ${{\,\mathrm{{\mathbb {R}}}\,}}_{\ge 0}^6$ for the Full ERK model, ${{\,\mathrm{{\mathbb {R}}}\,}}_{\ge 0}^5$ for the Rational ERK model, and ${{\,\mathrm{{\mathbb {R}}}\,}}_{\ge 0}^3$ for the Linear ERK model. One can think of the data as being a point $z^*$ in the space of observable model solutions, i.e. ${{\,\mathrm{{\mathbb {R}}}\,}}^{21r}$, and parameter estimation corresponds to attempting to compute the inverse image $\phi _{t_1,\ldots ,t_7}^{-1}(z^*)$ of this map at that point. Structural identifiability generally corresponds to the model prediction map $\phi _{t_1,\ldots ,t_7}$ being injective. Real-world observations are noisy, hence the data point $z^*$ may not be in the image of the map $\phi _{t_1,\ldots ,t_7}$. Thus, when performing parameter estimation, we instead search for parameters yielding model predictions close to the data point $z^*$. Practical identifiability broadly corresponds to having the set of parameters with model predictions close to the data point $z^*$ being bounded. In Sect. 4.1 we show that the Linear ERK model is structurally identifiable on its whole parameter space, while the Rational ERK model and the Full ERK model are structurally identifiable on some open dense subset of their parameter space. In Sect. 4.2 we show that the Linear ERK model is practically identifiable for our experimental data, providing the proof of Theorem 1. By contrast, we provide evidence that the Rational ERK model and Full ERK model are not practically identifiable.

4.1 Structural Identifiability

First, we study the structural identifiability of our ODE models, that is whether the model prediction map $\phi _{t_1,\ldots ,t_7}:\Theta \rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}^{21r}$ is one-to-one, or at least locally one-to-one. We start by providing a formal definition of structural identifiability for models given by ODE systems with specific time points. Suppose we have a rational ODE system in variables $x\in {{\,\mathrm{{\mathbb {R}}}\,}}^n$ and parameters $\theta \in {{\,\mathrm{{\mathbb {R}}}\,}}^m$, given by

$$\begin{aligned} \frac{\textrm{d} x}{\textrm{d}t} = f(x,\theta ), \end{aligned}$$

(12)

where f is a vector of rational functions in ${{\,\mathrm{{\mathbb {R}}}\,}}(x,\theta )^n$. We assume that the measurable output is $y=g(x,\theta )$ where g is also a vector of rational functions. Let ${\hat{x}}(\theta ,t)$ be a solution of (12) for the parameter value $\theta \in \Theta $ and then let ${\hat{y}}(\theta , t)=g({\hat{x}}(\theta ,t),\theta )$ be the observable solution for the same parameter value. Then, supposing that there are r replicates of the experiment, for the specific time points $t_1,\ldots ,t_l$ the model prediction map is given by

$$\begin{aligned} \phi _{t_1,\ldots ,t_l}(\theta )=\underbrace{({\hat{y}}(\theta ,t_1),\ldots ,{\hat{y}}(\theta ,t_l),\dots ,{\hat{y}}(\theta ,t_1),\ldots ,{\hat{y}}(\theta ,t_l))}_{r {\text {times}}}. \end{aligned}$$

The model prediction map then induces an equivalence relation $\sim _{t_1,\ldots ,t_l}$ on the parameter space $\Theta $ via

$$\begin{aligned} \theta \sim _{t_1,\ldots ,t_l}\theta ' {\text { if and only if }} \phi _{t_1,\ldots , t_l}(\theta )=\phi _{t_1,\ldots , t_l}(\theta '), \end{aligned}$$

for any $\theta ,\theta '\in \Theta $.

Definition 5

(c.f. Definition 2.8 in Dufresne et al. (2018)) Suppose we have a model given by a system of rational ODEs (as above) with parameter space $\Theta $ and model prediction map $\phi _{t_1,\ldots ,t_l}$. We say a model is:

globally identifiable if every equivalence class of $\sim _{t_1,\ldots ,t_l}$ on $\Theta $ has size exactly 1.
generically identifiable if for almost all $\theta \in \Theta $ the equivalence class of $\theta $ has size exactly 1.
locally identifiable if for almost all $\theta \in \Theta $ the equivalence class of $\theta $ is finite.
generically non-identifiable if for almost all $\theta \in \Theta $ the equivalence class of $\theta $ is infinite.

Here “almost all” means everywhere except possibly in a closed subvariety (i.e. the set of common zeroes of some polynomials).

There are several approaches to assess structural identifiability. All identifiability methods involve a certain number of assumptions of genericity, but not always explicitly (see for example discussions in Ovchinnikov et al. (2021), Hong et al. (2020), Joubert et al. (2021), Saccomani et al. (2003), Villaverde et al. (2018), Villaverde et al. (2019)). First, all methods assume that one has access to the whole trajectory of the observable output, and so are looking at the size of the equivalence classes of the equivalence relation $\sim _\infty $ on $\Theta $ defined as

$$\begin{aligned} \theta \sim _\infty \theta ' {\text { if and only if }} {\hat{y}}(\theta ,t)= {\hat{y}}(\theta ',t) {\text { for all }} t\ge 0. \end{aligned}$$

For rational ODE models with time series data as considered here, a result of Sontag (2002) proves if at least $2m+1$ generic time points are observed, where m is the dimension of the parameter space, then the equivalence relation $\sim _{t_1,\ldots ,t_{2m+1}}$ coincides with the equivalence relation $\sim _\infty $. If there are fewer time points or they are not generic, it could be that almost all equivalence classes of $\sim _\infty $ have size 1 but those of $\sim _{t_1,\ldots ,t_l}$ are larger. For the Linear ERK model, the parameter space has dimension 3, so we have enough time points, although we do not know a priori if they are generic. In fact, this model admits analytic solutions (see Sect. 4.2), so we can build the model prediction map explicitly and determine its identifiability directly. By a straightforward computation, we can show that for any choice of three distinct non-zero time points, the model prediction map $\phi _{t_1,t_2,t_3}$ of the Linear ERK model is injective and so the model is globally structurally identifiable (see Appendix A.4 for details). In particular, it follows that any choice of three distinct time points is generic. For the Rational ERK model and the Full ERK model, the parameter space has dimensions 5 and 6, respectively; hence, we may not have enough time points, and we cannot determine the validity of any structural identifiability results for these specific model prediction maps. Indeed, these two models are non-linear and do not admit analytic solutions that would allow us to make the same argument as for the Linear ERK model. This is an instance of a more general open problem:

Open Problem 6

Find and implement an algorithm to determine structural identifiability of a rational ODE model with time series data at specific given time points $\{t_1,\ldots ,t_l\}$.

Methods to assess the structural identifiability of ODE models include the classical approach via Taylor series (Pohjanpalo 1978) and generating series (Grewal and Glover 1976), and, more recently, approaches based on differential algebra (Audoly 2001; Saccomani et al. 2003; Hong et al. 2020). In this paper, we use SIAN (Hong et al. 2019), an approach based on differential algebra implemented in Maple (2019).

Similar to other methods based on differential algebra (for example, the method implemented in DAISY (Bellu et al. 2007)), SIAN is based on the differential Nullstellensatz (Ritt 1950, Chapter 1) or (Seidenberg 1952, Section 4). For a differentially closed field ${\mathbb {K}}$,this theorem establishes a correspondence between radical differential ideals and differentially closed subsets of ${\mathbb {K}}^n$. In the context of an ODE system, this implies that the solutions of the ODE system are completely determined by a prime differential ideal in a differential ring (see below). Criteria for identifiability can then be extracted from the ideal (or the quotient ring). The requirement that ${\mathbb {K}}$ is differentially closed then means that the solutions in question are possibly complex-valued, and the identifiability results will be about complex parameters, whether this is stated explicitly or not. For this reason, Hong et al. (2020) state their definition for complex parameters.

Remark

As mentioned above, the first difference between our definition of identifiability and Hong et al.’s is that their parameter space is a subset of ${{\,\mathrm{{\mathbb {C}}}\,}}^n$ instead of ${{\,\mathrm{{\mathbb {R}}}\,}}^n$. A second difference to note is that what Hong et al. (2020) call “globally identifiable” corresponds to what we call generically identifiable. Finally, Hong et al.’s (2020) definition is written for components of the parameters and makes the notion of “almost all” more precise.

The starting point is an ODE system of the same form as in Eq. (12) together with the initial condition $x(0)=x_0$. Let Q be the least common multiple of all the polynomials appearing in the denominators in f and g. Then we have $f=F/Q$ and $g=G/Q$ where F and G are polynomial functions. Note that SIAN usually views the initial conditions as additional unknown components of the parameter that one may want to identify. The differential ring of interest is the differential ring ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}$ (the differential ring in indeterminates x and y over the fraction field ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )$, i.e. the field of complex rational functions in the parameters). We can think of this ring as a polynomial ring in infinitely many indeterminates: $\theta $, x, y and the infinitely many higher derivatives of x and y (i.e. $x^{(i)}$ and $y^{(i)}$ for $i\ge 1$). We are interested in the differential ideal $I_{\Sigma }$ of ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}$ given by

$$\begin{aligned} I_{\Sigma }:=\left( (Q\dot{x_i}-F_i)^{(j)},(Q\dot{y_k}-G_k)^{(j)}\mid 1\le i\le n,\, 1\le k\le m,\, j\ge 0\right) :Q^\infty , \end{aligned}$$

(13)

where for non-empty subsets T, S of a ring R, the set $T:S^\infty $ is defined as follows:

$$\begin{aligned} T:S^\infty :=\{r\in R\mid {\text {there exist }} s\in S, \,n\in {{\,\mathrm{{\mathbb {Z}}}\,}}_{\ge 0} {\text { such that }} s^nr\in T\}. \end{aligned}$$

Note that for polynomial systems like the Full ERK model and the Linear ERK model, we have $Q=1$, and so the column operation is not needed and the ideal $I_{\Sigma }$ is simply the differential ideal generated by the equations defining the ODE system and their derivatives. The ideal $I_{\Sigma }$ is the ideal of all differential polynomials in ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}$ that vanish on the solutions of the system of ODE system (12) (Saccomani et al. 2003; Hong et al. 2020).

The ideal $I_{\Sigma }$ is prime (Hong et al. 2020) and so the quotient ring ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}/I_{\Sigma }$ is an integral domain. Let ${\mathbb {K}}:=Q({{\,\mathrm{{\mathbb {C}}}\,}}[\theta ]\{x,y\}/I_{\Sigma })$ be the field of fractions of the domain ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}/I_{\Sigma }$, and let $\Bbbk $ be the subfield of ${\mathbb {K}}$ generated by the image of ${{\,\mathrm{{\mathbb {C}}}\,}}\{y\}$, that is, the subfield generated by the elements of the form $y_i+I_{\Sigma }$. We can now state the non-constructive algebraic criterion for structural identifiability:

Proposition 7

(c.f. Proposition 3.4 in Hong et al. 2020) Suppose we have a model given by a system of rational ODEs as described above.

If the fields $\Bbbk $ and $\Bbbk (\theta )$ coincide, then the model is generically identifiable.
If the field extension $\Bbbk \subseteq \Bbbk (\theta )$ is algebraic, then the model is locally identifiable.

Remark

Note that Proposition 3.4 in Hong et al. (2020) implies that $\Bbbk $ and $\Bbbk (\theta )$ coincide (respectively the field extension $\Bbbk \subseteq \Bbbk (\theta )$ is algebraic) if and only if the model is globally identifiable (respectively, locally identifiable) in the sense of Hong et al. (2020). We are interested in something weaker; we only wish to identify parameters in the parameter space $\Theta $, which is a subset of the real positive octant.

The criterion provided by the proposition above is not constructive, as it involves the field of rational functions of an infinitely generated ${{\,\mathrm{{\mathbb {C}}}\,}}$-algebra. Hong et al. (2020) go on to provide a constructive version of the criterion (Hong et al. 2020, Section 3). The software SIAN (Hong et al. 2019), which we use here, is in turn based on a probabilistic version of the criterion (Hong et al. 2020, Section 4). Note that local identifiability is determined via the Taylor series approach.

We now consider the issue of initial conditions. As mentioned above, by default, SIAN considers the initial conditions as parameters that one may wish to identify. Other methods, like the differential algebra method as implemented in DAISY (Bellu et al. 2007), do not explicitly address initial conditions. Ovchinnikov et al. show in (Ovchinnikov et al. 2021, Theorem 19) that input-output identifiability corresponds to what they call multiple experiment identifiability, that is, identifiability from sufficiently many generic initial conditions. DAISY and COMBOS verify input-output identifiability (Meshkat et al. 2014).

Using SIAN (Hong et al. 2019), we verify that all three models are generically identifiable. In particular, in all three models all parameters are generically globally identifiable. Recall that this result is valid under the assumption that we have measurements at sufficiently many generic time-points, and for generic initial conditions. Inspired by the discussion in Saccomani et al. (2003), in Appendix A.3 we show that the set of differential polynomials in ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}$ vanishing on those solutions of the system 12 with initial conditions $S_0(0)=5\mu M$ and $S_1(0)=S_2(0)=0\mu M$ for all three models, as well as $C_1(0)=C_2(0)=0\mu M$ and $E(0)=0.65\mu M$ for the Full ERK model coincides with the ideal $I_\Sigma $. This means that the set of solutions with initial conditions corresponding to our experimental setup is dense in the set of all solutions for the Kolchin topology (induced by the differential ideals of ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}$). We can, therefore, conclude that the initial conditions specific to the experimental setup are indeed generic. Therefore, our structural identifiability results hold for the initial conditions specific to the experimental setup.

Remark

Using SIAN we can show that the Full ERK model is also generically identifiable with measurable outputs $S_0+C_1$, $S_1+C_2$ and $S_2$ which is what was actually measured experimentally (see Sect. 3.2.4).

4.2 Practical Identifiability

Suppose a model is generically identifiable, then, generically, distinct parameters produce distinct data points. However, if there are parameter values that are arbitrarily far from one another but produce data points close to each other, parameter estimation would not be meaningful in practice. Practical (non-)identifiability aims to categorise models exhibiting such undesirable behaviour. For example, sloppiness (Gutenkunst 2007), uncertainty quantification (Smith 2013) and filtering problems (Shi et al. 1999) study mathematical models with a similar aim. We use a definition of practical identifiability introduced in Dufresne et al. (2018), which was adapted from the definition given in Raue et al. (2009).

Practical identifiability depends on more than the defining equations and specification of input and output of the model. Practical identifiability will be influenced by the precise choice of time points, the method used for parameter estimation, the assumption on measurement noise of the data, and the way we measure distances in parameter space. It may also vary on the area in the data space. A data point $z^*$ is an experimental observation in the form of an N-dimensional vector whose entries are the observed values of the measured variables at each of the specific time points for each replicate of the experiment. We focus on practical identifiability for maximum likelihood estimation (MLE), one of the most widely used methods for parameter estimation (see, for example Ljung et al. 1987). Accordingly, in the remainder of this section, we consider models $({\mathcal {M}},\phi _{t_1,\ldots ,t_s},\psi ,d_\Theta )$ with a precise choice of model prediction map $\phi _{t_1,\ldots ,t_s}$ with specific time points ${t_1,\ldots ,t_s}$, a specific assumption for the probability distribution $\psi $ of measurement noise and a choice of reference metric $d_\Theta $ on parameter space $\Theta $. We will also assume that the model considered is at least generically identifiable, so that MLE exist and are unique for generic data (see (Dufresne et al. 2018, Proposition 4.15)). We write ${\hat{\theta }}(z^*)$ to denote the MLE for $z^*$, that is, ${\hat{\theta }}(z^*):={\text {max}}_{\theta \in \Theta }\psi (\theta ,z^*)$.

We define an $\delta $-confidence region ${U}_\delta (z^*)$ as follows:

$$\begin{aligned} U_{\delta }(z^*) := \{\theta \in \Theta \mid - \log \psi (\theta ,z^*) < \delta \}. \end{aligned}$$

The set $U_{\delta }(z^*)$, often known as a likelihood-based confidence region (Vajda et al. 1989; Casella and Berger 2002), is intimately connected with the likelihood ratio test. Specifically, suppose we had a null hypothesis ${\textbf{H}}_0$ that data point $z^*$ has true parameter $\theta ^*$, and we wished to test the alternative hypothesis ${\textbf{H}}_1$ that $z^*$’s true parameter is something else. By definition, a likelihood ratio test would reject the null hypothesis when

$$\begin{aligned} \Lambda (\theta ^*,z^*):= \frac{\psi (\theta ^*,z^*)}{\psi ({\hat{\theta }}(z^*),z^*)} \le k^*, \end{aligned}$$

where $k^*$ is a critical value, with the significance level $\alpha $ equal to the probability ${\text {Pr}}(\Lambda (z^*)\le k^* | {\textbf{H}}_0)$ of rejecting the null hypothesis when it is in fact true. The set of parameters such that the null hypothesis is not rejected at significance level $\alpha $ is

$$\begin{aligned} \{ \theta ' \in \Theta \mid -\log \psi (\theta ',z^*)<-\log \psi ({\hat{\theta }}(z^*),z^*)-\log k^*\}, \end{aligned}$$

that is, $U_\delta (z^*)$, where $\delta =-\log \psi ({\hat{\theta }}(z^*),z^*)-\log k^*$.

Definition 8

(Dufresne et al. 2018, Definition 4.17) The model $(M,\phi _{t_1,\ldots ,t_s},\psi ,d_{\Theta })$ is practically identifiable for a data point $z^*\in {{\,\mathrm{{\mathbb {R}}}\,}}^N$ at significance level $\alpha $ if and only if the confidence region $U_{\delta }(z^*)$ is bounded with respect to $d_\Theta $, where

$$\begin{aligned} \delta =-\log \psi ({\hat{\theta }}(z^*),z^*)-\log k^* \end{aligned}$$

and

$$\begin{aligned} \alpha ={\text {Pr}}\left( \frac{\psi ({\hat{\theta }}(z^*),{\hat{z}})}{{\text {max}}_{\theta \in \Theta }\psi (\theta ,{\hat{z}})}<k^* \mid {\hat{z}} {\text { is data with true parameter }} {\hat{\theta }}(z^*)\right) . \end{aligned}$$

(14)

For our analysis, we make the common assumption that the measurement noise is additive Gaussian with covariance matrix equal to a multiple of the identity matrix. The assumption is implicit when performing a least-squares fit computation for MLE. In our setup, we are measuring 3 substances at 7 time-points and there were r replicates, so our assumption on the measurement noise means that the probability distribution of the data is given by

$$\begin{aligned} \psi (\theta ,z)=(2\pi \sigma ^2)^{\frac{-21r}{2}}e^{-\frac{1}{2\sigma ^2}\Vert z-\phi _{t_1,\ldots ,t_7}(\theta )\Vert _2^2}, \end{aligned}$$

where $\sigma ^2 I_{21}$ is the covariance. It then follows that

$$\begin{aligned} \delta&=-\log \psi ({\hat{\theta }}(z^*),z^*)-\log k^*\\&=\frac{21r}{2}\log (2\pi \sigma ^2)+\frac{1}{2\sigma ^2}\Vert z^*-\phi _{t_1,\ldots ,t_7}({\hat{\theta }}(z^*))\Vert _2^2-\log k^*, \end{aligned}$$

and

$$\begin{aligned} -\log \psi (\theta ',z^*)=\frac{21r}{2}\log (2\pi \sigma ^2)+\frac{1}{2\sigma ^2}\Vert z^*-\phi _{t_1,\ldots ,t_7}(\theta ')\Vert _2^2. \end{aligned}$$

Therefore, we have that

$$\begin{aligned} U_\delta (z^*)&=\{\theta '\in \Theta \mid \Vert z^*-\phi _{t_1,\ldots ,t_7}(\theta ')\Vert _2^2 <\Vert z^*-\phi _{t_1,\ldots ,t_7}({\hat{\theta }}(z^*))\Vert _2^2-2\sigma ^2\log k^*\}\\&=\phi _{t_1,\ldots ,t_7}^{-1}({B}_{\rho }(z^*)), \end{aligned}$$

where ${B}_{\rho }(z^*)$ is the Euclidean open ball of radius $\rho :=\sqrt{(\Vert z^*-\phi _{t_1,\ldots ,t_7}({\hat{\theta }}(z^*))\Vert _2^2-2\sigma ^2\log k^*)}$ around the data point $z^*$. It follows that under our assumptions, determining whether the various models we study are practically identifiable corresponds to determining whether the preimages under the model prediction map of small open balls around data points are bounded in parameter space. The size of the balls will depend on the data point and the significance level $\alpha $ (or equivalently the critical value $k^*$).

4.3 Algorithm for Testing Practical Identifiability

The Rational ERK model and the Full ERK model do not admit analytic solutions, hence we do not have access to an explicit model prediction map $\phi _{t_1,\ldots ,t_l}$. Therefore, we must approximate $\phi _{t_1,\ldots ,t_l}$ and thus also $U_\delta $ using numerical methods and repeated sampling.

First, we assume that our measurements have been corrupted with some Gaussian noise with mean 0 and variance $\sigma ^2$. This variance is identical across measurement quantities, time points and trials. The noise distributions are independent across measurements.

As we have assumed that measurement noise is additive Gaussian with covariance matrix equal to a multiple of the identity matrix, we can obtain an MLE, given some data $z^*$, by solving a least squares problem. This gives us ${\hat{\theta }}(z^*)$. We use this parameter to calculate the sample variance, assuming that the mean of each quantity is the model trajectory at each time point. This gives us an estimate of the covariance $\sigma ^2$.

Recall that $\delta $ is defined to be $-\log \psi ({\hat{\theta }}(z^*),z^*)-\log k^*$. The log-likelihood is easy to compute, as we already know $z^*$ and ${\hat{\theta }}(z^*)$, and can estimate $\phi _{t_1,\ldots , t_7}$ using a numerical solution to the ODE system. We use the following procedure to approximate $-\log k^*$:

This simply follows the definition of $k^*$ in Eq. (14) and approximates $-\log k^*$ by repeatedly sampling likelihood-ratios under our given noise assumptions and then taking a $(1-\alpha )$-quantile (as $-\log (\,\cdot \,)$ is a monotonically decreasing function).

Remark

In a situation where the number of replicates r is large, an approximate $\delta $ can be computed from $\alpha $ that depends primarily on the distance between the data point $z^*$ and the predicted data point $\phi _{t_1,\ldots ,t_7}({\hat{\theta }}({z^*}))$ corresponding to the MLE.

From the definition, we have $\delta =-\log \psi ({\hat{p}}(z^*),z^*)-\log k^*$, meaning that $k^*=1/e^{\delta }\psi ({\hat{p}}(z^*),z^*)$, and so we can describe $\alpha $ in terms of $\delta $ directly:

$$\begin{aligned} \alpha ={\text {Pr}}\left( \frac{\psi ({\hat{\theta }}(z^*),{\hat{z}})}{{\text {max}}_{\theta \in \Theta }\psi (\theta ,{\hat{z}})}<1/e^{\delta }\psi ({\hat{\theta }}(z^*),z^*) \mid {\hat{z}} {\text { is data with true parameter} } {\hat{\theta }}(z^*)\right) . \end{aligned}$$

This is equivalent to

$$\begin{aligned} \alpha ={\text {Pr}}\left( -\log \left( \frac{\psi ({\hat{\theta }}(z^*),{\hat{z}})}{{\text {max}}_{\theta \in \Theta }\psi (\theta ,{\hat{z}})}\right) >-\log (1/e^{\delta }\psi ({\hat{\theta }}(z^*),z^*) \mid {\hat{z}} {\text { has true parameter}} {\hat{\theta }}(z^*)\right) , \end{aligned}$$

and so

$$\begin{aligned} \alpha= & {} {\text {Pr}}\left( -\log \psi ({\hat{\theta }}(z^*),{\hat{z}})+\log {{\text {max}}_{\theta \in \Theta }\psi (\theta ,{\hat{z}})}>\delta \right. \\{} & {} \left. +\log \psi ({\hat{\theta }}(z^*),z^*) \mid {\hat{z}} {\text { has true parameter }} {\hat{\theta }}(z^*)\right) . \end{aligned}$$

Note that for each value of ${\hat{z}}$, the MLE ${\hat{\theta }}({\hat{z}})$ maximises $\psi (\theta ,{\hat{z}})$. It follows that

$$\begin{aligned} \alpha= & {} {\text {Pr}}\left( 2(\log \psi ({\hat{\theta }}({\hat{z}}),{\hat{z}})-\log \psi ({\hat{\theta }}(z^*),{\hat{z}}))>2\delta \right. \\{} & {} \left. +2\log \psi ({\hat{\theta }}(z^*),z^*) \mid {\hat{z}} {\text { has true parameter }} {\hat{\theta }}(z^*)\right) . \end{aligned}$$

Wilk’s theorem (Fan et al. 2000) implies that $2(\log \psi ({\hat{\theta }}({\hat{z}}),{\hat{z}})-\log \psi ({\hat{\theta }}(z^*),{\hat{z}}))$ is asymptotically $\chi ^2$ with three degrees of freedom. If $F({\hat{z}})$ is the asymptotic cumulative distribution function of $2(\log \psi ({\hat{\theta }}z),{\hat{z}})-\log \psi ({\hat{\theta }}(z^*),{\hat{z}}))$, then $\alpha $ is approximately equal to

$$\begin{aligned} \alpha&=1-{\text {Pr}}\left( 2(\log \psi ({\hat{\theta }}({\hat{z}}),{\hat{z}})-\log \psi ({\hat{\theta }}(z^*),{\hat{z}}))<2\delta \right. \\&\quad \left. +2\log \psi ({\hat{\theta }}(z^*),z^*) \mid {\hat{z}} {\text { has true parameter }} {\hat{\theta }}(z^*)\right) \\&\approx 1-F(2\delta +2\log \psi ({\hat{\theta }}(z^*),z^*)). \end{aligned}$$

Therefore, asymptotically we have that

$$\begin{aligned} \delta =F^{-1}(1-\alpha )/2-\log \psi ({\hat{\theta }}(z^*),z^*). \end{aligned}$$

Unfortunately, this is not applicable here, as the number of experiments is 5, 6 or 11, which are not large numbers. Indeed, the $\delta $ obtained by applying Wilks’ Theorem and the $\delta $ obtained via Algorithm 1 are notably different. For example, for the wild-type and the Linear model, we approximate $-\log k^*$ as 0.477 while Wilks’ theorem approximates it as 3.907.

In order to demonstrate practical non-identifiability for the Full and Rational ERK models, we pick two parameters from each model, based on which we can illustrate non-identifiability well by presenting confidence areas marginalised to these two parameters. This choice of parameters is informed by performing a (ill-posed) Bayesian parameter inference first (see next section). This procedure is described here for the Rational ERK model, but works similarly for the full model:

While we do not know the values of $\kappa _1$ and $\gamma _1$, previous experimental work has provided bounds for $\kappa _1$ and $\gamma _1$, which we pass to the algorithm above. The list returned by the algorithm is a discrete approximation of the confidence area, marginalised to the pair of parameters $\kappa _1$ and $\gamma _1$. We plot these points for visual inspection, which can be seen in Fig. 2. The blue area reaching the upper and leftmost boundary of the plot indicates that the confidence region is very unlikely to be bounded and that this model is very unlikely to be practically identifiable.

The source of this practical non-identifiability of the Full ERK model and the Rational ERK model is not completely clear. One possible source of non-identifiability could be the choice of time points. Indeed, as mentioned in Sect. 4.1, in both cases we do not know if the time points are sufficiently generic. There are reasons to believe that not all practical non-identifiability can be explained by having an insufficient number of time points. Indeed, as part of earlier work during the preparation of Yeung et al. (2019), additional time point data were simulated for the Full ERK model, but confidence regions still appeared unbounded. Another possible source of non-identifiability could be that for the given experimental data there is a valid quasi-steady-state approximation resulting in a smaller dimensional parameter space. At quasi-steady-state parameter values, the reduction is exact and so for these parameters, the equivalence class of $\sim _{t_1,\ldots ,t_7}$ is positive dimensional. Intuitively, since the solutions of the Full ERK model and the Rational ERK model are close to those of the Linear ERK model near quasi-steady-state parameter values, the confidence regions should contain the equivalence class of the nearby quasi-steady-state parameter value, which in this case, was unbounded. This might be an example of more widespread phenomena.

4.4 The Practical Identifiability of the Linear ERK Model

We now consider the practical identifiability of the Linear ERK model. What distinguishes the Linear ERK model from the Full ERK model and the Rational ERK model is that an analytic solution to the ODE system is available and so we can construct an explicit model prediction map. The solution to the ODE system (10) with initial conditions $S_0(0)=5\mu M$ and $S_1(0)=S_2(0)=0$ is given by:

$$\begin{aligned} {S}_0(t)=&5e^{-\kappa _1t}\\ {S}_1(t)=&5\kappa _1(1-\pi )t e^{-\kappa _1t}&{\text {if }}\; \kappa _1=\kappa _2\\&5\kappa _1(1-\pi )(e^{-\kappa _2t}-e^{-\kappa _1t})/(\kappa _1-\kappa _2)&{\text {otherwise }}\\ {S}_2(t)=&5-{S}_0(t)-{S}_1(t). \end{aligned}$$

As we did for the Rational ERK model in Sect. 4.3, for a given data point $z^*$, we obtain an MLE ${\hat{\theta }}(z^*)$ by solving a least-squares problem. We then use Algorithm 1 to approximate $-\log k^*$, and then $\delta $, using the explicit model prediction map we construct based on the analytic solutions. In Fig. 3 we plot the boundary of the confidence regions at significance level $\alpha =0.05$ for the data points corresponding to the wild-type and each mutant. All five confidence regions are seen to be bounded, and we conclude that the model is practically identifiable for those data points.

5 Topological Data Analysis for Kinetic Parameter Inference

Since the Linear ERK model is practically identifiable, we now infer the parameters of this model using data from wild-type and mutant experiments. First, we briefly review the Bayesian approach for inferring parameters of the Linear ERK model, as already computed by Yeung et al. (2019). We then introduce topological data analysis (TDA) and present previous results that enable us to analyse the parameters sampled from the posteriors of the wild-type (WT) and four mutants with topological data analysis. Specifically we exploit a theorem by Bobrowski et al. (2017) for hypothesis testing of topological distances in noisy settings. We implement their theoretical result and compare the topological distances between WT and mutants.

5.1 Bayesian Inference

Given experimental data and a mathematical model, we seek to infer parameters for which the model accurately fits the data. We choose to do this via Bayesian inference. The theory of Bayesian statistics captures how our belief in the true values of these parameters changes when we make observations (in this case: measurements) in the language of probability theory. Most importantly, Bayesian inference does not infer a single value for each parameter, as would a frequentist approach; rather, it infers a probability distribution of parameter values expressing how strongly we believe a certain set of parameter values is correct.

Formally, we are given a parameter space $\Theta $ and observations x from some sample space ${\mathcal {X}}$. Combining the mathematical model with noise assumptions on available measurements, we obtain an expression for $p(x|\theta )$, the likelihood of observing x assuming that the parameter of the model is $\theta \in \Theta $. In addition, we need to specify a measure of belief in the parameter values before we observe any data, expressed through a probability density $p(\theta )$, called the prior distribution. Theoretically, we want to inform a Bayesian inference only through observations. Consequently, we do not want to inform the inference by placing strong prior beliefs on certain parameter values. In practice, however, a trade-off between neutral prior beliefs (which should only account for substantive prior knowledge and possibly scientific conjectures), analytical convenience and computational tractability is commonplace (Gelman and Shalizi 2013, 11-12).

Having selected a mathematical model and a prior distribution, our formal belief in parameter values becomes

$$\begin{aligned} p(\theta |x)\propto p(x|\theta )\cdot p(\theta ) \end{aligned}$$

by making observations $x\in {\mathcal {X}}$. The probability density $p(\theta |x)$ is called the posterior distribution. The proportionality in the above equation indicates that we omitted a normalisation which is independent of $\theta $. As one can approximately sample from $p(\theta |x)$ without normalising, the normalisation factor is not necessary for our application.

For the Linear ERK model (Eqs. (10)), the parameter is $\theta =(\kappa _1, \kappa _2, \pi , \sigma )\in {{\,\mathrm{{\mathbb {R}}}\,}}^4=\Theta $. Here, the first three components come from the parameter of the Linear ERK model, while $\sigma $, the variance of the distribution of the data, which must be inferred in order to construct a Bayesian model, and will be subsequently marginalised (i.e. integrated out). The observations are measurements of $S_0$, $S_1$ and $S_2$. As measurements of each MEK type are taken from r replicates, at 7 different times, for 3 phosphorylation states of substrate, we formally have ${\mathcal {X}}={{\,\mathrm{{\mathbb {R}}}\,}}^{r\cdot 3\cdot 7}={{\,\mathrm{{\mathbb {R}}}\,}}^{r\cdot 21}$. We have $r=11$ for the wild-type, $r=6$ for SSDD and $r=5$ for all other variants.

To construct a statistical model on the mechanistic Linear ERK model, we set the prior distributions to

$$\begin{aligned} \kappa _1,\kappa _2 \sim {\textit{Unif}}(0\,(1/{\min }),10\,(1/{\min })),\quad \sigma \sim {\textit{Unif}}(0\,(\mu M),10\,(\mu M)), \end{aligned}$$

a uniform distribution over values we deem biologically feasible for these parameters (Yeung et al. 2019), and $\pi \sim {\textit{Unif}}(0,1)$, as $\pi $ can only take values within this range by definition.

Given samples $S_0^*$, $S_1^*$ and $S_2^*$, we assume that

$$\begin{aligned} \left( S_0^*\right) _{t,i}\sim {\mathcal {N}}\left( S_0(\kappa _1,\kappa _2,\pi ,t),\sigma \right) ,\\ \left( S_1^*\right) _{t,i}\sim {\mathcal {N}}\left( S_1(\kappa _1,\kappa _2,\pi ,t),\sigma \right) ,\\ \left( S_2^*\right) _{t,i}\sim {\mathcal {N}}\left( S_2(\kappa _1,\kappa _2,\pi ,t),\sigma \right) , \end{aligned}$$

where t denotes the respective measurement time and i indexes the sample. Here, $S_j(\kappa _1,\kappa _2,\pi ,t)$ is a solution to the ODE system at time t for parameters $\kappa _1$, $\kappa _2$ and $\pi $. For the Linear ERK model, we can construct an analytic solution to the governing equations, but generally, a numerical solution suffices. Such ODE solutions give rise to an expression for the likelihood $p(x\vert \theta )$.

We note that in the above Bayesian model, some standard simplifying assumptions were made. First, in the given setup, negative values of measurements of $S_0$, $S_1$ and $S_2$ have strictly positive likelihoods, which is not true in reality. Second, we assume that $(S_0^*)_{t,i}$, $(S_1^*)_{t,i}$ and $(S_2^*)_{t,i}$ are independent random variables for all t and i and that they have the same standard deviation. Despite these assumptions, we obtained good fits to the data. For example, performing an inference with three different standard deviation parameters $\sigma _0$, $\sigma _1$ and $\sigma _2$ for $S_0$, $S_1$ and $S_2$, respectively, did not significantly improve the fits to the data.

This Bayesian inference framework can also be applied to other ODE models describing the measurements, including the Rational ERK model (Eqs. (9)) and the Full ERK model (Eqs. (1)). In these cases, we employ numerical solutions and adapt priors to the larger parameter spaces.

We note that for the Full ERK model and the Rational ERK model, the choice of prior distributions significantly changes both the location and prominence of modes of the posterior distributions. In particular, they tend to be near the endpoints of the prior distributions. This is linked to the practical non-identifiability of these models and prevents us from interpreting parameter modes, and also from conducting a sensible topological comparison that is not highly dependent on the choice of prior distribution.

In order to compute posterior distributions of the involved parameters, we used PyStan, the Python version of the statistical software STAN (Carpenter et al. 2017). While analytical expressions for the posterior distributions are too complex to be feasible for interpretation, PyStan enables us to approximately sample from them via Hamiltonian MCMC. The resulting samples (visualised in Fig. 4) form the basis of our further analysis.

5.2 Topological Analysis

To analyse the topology of the samples of the resulting posterior distributions, we introduce notation and methodology from Topological Data Analysis (TDA).

Definition 9

Let ${\textbf{v}}$ be a finite set of vertices. A subset of the power-set of ${\textbf{v}}$, ${{\,\mathrm{{\mathcal {K}}}\,}}\subseteq {\mathcal {P}}({\textbf{v}})$, is called a simplicial complex if for any $\tau \in {{\,\mathrm{{\mathcal {K}}}\,}}$ the relation $\tau '\subseteq \tau $ implies $\tau '\in {{\,\mathrm{{\mathcal {K}}}\,}}$.

We write ${{\,\mathrm{{\mathcal {K}}}\,}}_i=\{\tau \in {{\,\mathrm{{\mathcal {K}}}\,}}\,\vert \,|\tau |=i+1\}$ and call the elements of ${{\,\mathrm{{\mathcal {K}}}\,}}_i$ the i-simplices. A map $h:{\textbf{v}}\rightarrow {\textbf{v}}'$ which extends to a map $h:{\mathcal {K}}\rightarrow {\mathcal {K}}'$ by $h(\tau ):=\{h(v)\,\vert \, v\in \tau \}$ for each $\tau \in {{\,\mathrm{{\mathcal {K}}}\,}}$ is called a simplicial map.

We can view a simplicial complex as a combinatorial description of a topological space. Given a simplicial complex ${{\,\mathrm{{\mathcal {K}}}\,}}$, we can investigate its geometric realisation

$$\begin{aligned} \left| {{\,\mathrm{{\mathcal {K}}}\,}}\right| :=\bigcup _{\tau \in {{\,\mathrm{{\mathcal {K}}}\,}}}\textrm{cvx}(\tau )\subseteq {{\,\mathrm{{\mathbb {R}}}\,}}\langle {\textbf{v}}\rangle , \end{aligned}$$

where $\textrm{cvx}$ denotes the convex hull in the real free vector space generated by the vertices V. The realisation $|{{\,\mathrm{{\mathcal {K}}}\,}}|$ is endowed with the subspace topology in ${{\,\mathrm{{\mathbb {R}}}\,}}\langle {\textbf{v}}\rangle $. An example of a simplicial complex and its geometric realisation can be found in Fig. 5a. Since ${{\,\mathrm{{\mathcal {K}}}\,}}$ is a discrete and combinatorial entity, one can compute meaningful topological information from topological spaces (or datasets) described by simplicial complexes.

5.2.1 Homology

One topological invariant we can compute from simplicial complexes is homology. In each dimension k, the dimension of the k-th homology group can be thought of as the number of voids in a simplicial complex enclosed by a k-dimensional boundary. We restrict our definition of homology over the field of two elements, ${{\,\mathrm{{\mathbb {F}}}\,}}_2$, which is the setting for our computations. For a simplicial complex, the homology groups coincide with those of its geometric realisation (viewed as a topological space).

Definition 10

Let ${{\,\mathrm{{\mathcal {K}}}\,}}$ be a simplicial complex. We define its chain complex ${\mathcal {C}}_\bullet ({{\,\mathrm{{\mathcal {K}}}\,}})$ over ${{\,\mathrm{{\mathbb {F}}}\,}}_2$ to be the collection of vector spaces ${\mathcal {C}}_i={{\,\mathrm{{\mathbb {F}}}\,}}_2\langle {{\,\mathrm{{\mathcal {K}}}\,}}_i\rangle $, together with the collection of linear maps $\partial _i:{\mathcal {C}}_i\rightarrow {\mathcal {C}}_{i-1}$ induced by

$$\begin{aligned} \partial _i: \tau \mapsto \sum _{v\in \tau }\tau \backslash \{v\} \end{aligned}$$

for all $\tau \in {{\,\mathrm{{\mathcal {K}}}\,}}_i$.

We observe that $\partial _i\circ \partial _{i+1}=0$ for all i. Furthermore, we note that any simplicial map $h:{{\,\mathrm{{\mathcal {K}}}\,}}\rightarrow {{\,\mathrm{{\mathcal {K}}}\,}}'$ induces a collection of maps on corresponding chain complexes ${\mathcal {C}}_\bullet $ and ${\mathcal {C}}'_\bullet $, denoted $\{{\hat{h}}_i:{\mathcal {C}}_i\rightarrow {\mathcal {C}}'_i\}_i$, which are defined as

$$\begin{aligned} {\hat{h}}_i(\tau ):={\left\{ \begin{array}{ll}h(\tau ) &{} {\text {if }}\; \dim h(\tau )=\dim \tau \\ 0 &{} {\text {otherwise}}\end{array}\right. }. \end{aligned}$$

We call such a collection of maps a chain map from ${\mathcal {C}}_\bullet $ to ${\mathcal {C}}'_\bullet $. It satisfies $\partial '_i\circ {\hat{h}}_i={\hat{h}}_{i-1}\circ \partial _i$ for all i.

Definition 11

Let ${{\,\mathrm{{\mathcal {K}}}\,}}$ be a simplicial complex and let ${\mathcal {C}}_\bullet ({{\,\mathrm{{\mathcal {K}}}\,}})$ be its associated chain complex over ${{\,\mathrm{{\mathbb {F}}}\,}}_2$. Then the k-th homology group of ${{\,\mathrm{{\mathcal {K}}}\,}}$ is defined to be the quotient of vector spaces

$$\begin{aligned} H_k({{\,\mathrm{{\mathcal {K}}}\,}}):=\frac{\ker \partial _i}{\textrm{im}\,\partial _{i+1}}. \end{aligned}$$

Note that for ${\hat{h}}:{\mathcal {C}}_\bullet ({{\,\mathrm{{\mathcal {K}}}\,}})\rightarrow {\mathcal {C}}_\bullet ({{\,\mathrm{{\mathcal {K}}}\,}}')$ the induced map $h^*:H_k({{\,\mathrm{{\mathcal {K}}}\,}})\rightarrow H_k({{\,\mathrm{{\mathcal {K}}}\,}}')$ given by $h^*:[c]\mapsto [{\hat{h}}_k(c)]$, where $c\in \ker \partial _k$ and the brackets denote equivalence up to translation by $\textrm{im}\,\partial _k$ and $\textrm{im}\,\partial '_k$ respectively, is well defined for all k (Otter 2017). Moreover, for simplicial maps $h:{{\,\mathrm{{\mathcal {K}}}\,}}\rightarrow {{\,\mathrm{{\mathcal {K}}}\,}}'$ and $h':{{\,\mathrm{{\mathcal {K}}}\,}}'\rightarrow {{\,\mathrm{{\mathcal {K}}}\,}}''$ we have $(h \circ h')^*= h^*\circ (h')^*$. This property is called the functorality of homology and will be used when we introduce persistence.

5.2.2 Persistence

We view point clouds as a discrete subset of a continuous geometric object embedded in Euclidean space. The underlying continuous space is the primary subject of interest. In order to obtain information about this geometric object, we wish to inflate our discrete points to a continuous space, or to capture a relative offset between points in this space. In practice, we usually do not know the adequate inflation resolution. Persistence theory offers an elegant way to overcome this caveat by scaling the resolution from fine to coarse, and tracking how the homology of these spaces evolves by considering their canonical inclusion relations.

Definition 12

Let ${{\,\mathrm{{\mathcal {K}}}\,}}$ be a simplicial complex and let $g:{{\,\mathrm{{\mathcal {K}}}\,}}\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ be a function such that $\tau \subseteq \tau '$ implies $g(\tau )\le g(\tau ')$ for any $\tau ,\tau '\in {\mathcal {K}}$. A filtration of the simplicial complex ${{\,\mathrm{{\mathcal {K}}}\,}}$ by g is then defined to be the sequence of simplicial complexes $\{{{\,\mathrm{{\mathcal {K}}}\,}}_L\}_{L\in {{\,\mathrm{{\mathbb {R}}}\,}}}$, where

$$\begin{aligned} {{\,\mathrm{{\mathcal {K}}}\,}}_L:=\{\tau \in {{\,\mathrm{{\mathcal {K}}}\,}}\,\vert \,g(\tau )\le L\}, \end{aligned}$$

together with the canonical inclusions $\iota _L^{L'}:{{\,\mathrm{{\mathcal {K}}}\,}}_L\hookrightarrow {{\,\mathrm{{\mathcal {K}}}\,}}_{L'}$ whenever $L\le L'$. An example of a filtration is visualised in Fig. 5b. In the same spirit, let ${\mathcal {T}}$ be a topological space and $g:{\mathcal {T}}\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ be a continuous function. A filtration of the topological space ${\mathcal {T}}$ is then defined to be the sequence of topological spaces $\{{\mathcal {T}}_L\}_{L\in {{\,\mathrm{{\mathbb {R}}}\,}}}$, where

$$\begin{aligned} {\mathcal {T}}_L:=\{x\in {\mathcal {T}}\,\vert \,g(x)\le L\}, \end{aligned}$$

together with the canonical inclusions $\iota _L^{L'}:{\mathcal {T}}_L\hookrightarrow {\mathcal {T}}_{L'}$ whenever $L\le L'$.

A common way of constructing a filtration from a point cloud ${\textbf{v}}\subset {{\,\mathrm{{\mathbb {R}}}\,}}^d$ is to set ${{\,\mathrm{{\mathcal {K}}}\,}}={\mathcal {P}}(X)$ and $g(\tau )=\max \{d(x,y)\,\vert \,x,y\in \tau \}$. This is called the Vietoris–Rips filtration, and ${{\,\mathrm{{\mathcal {K}}}\,}}_L$ is a good approximation to an inflation of ${\textbf{v}}$ by placing balls of radius L/2 at each point (Oudot 2015). We will consider the following alternative filtration. For a fixed $L\in {{\,\mathrm{{\mathbb {R}}}\,}}$ and map $p:{{\,\mathrm{{\mathbb {R}}}\,}}^d\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$, we set ${{\,\mathrm{{\mathcal {K}}}\,}}':={{\,\mathrm{{\mathcal {K}}}\,}}_L$ in the Vietoris–Rips sense and consider the filtration by the map $g':{{\,\mathrm{{\mathcal {K}}}\,}}'\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ defined by $g'(\tau ):=\max \{p(x)\,\vert \,x\in \tau \}$.

Definition 13

Let ${{\,\mathrm{{\mathbb {F}}}\,}}_2[t]$ be the ring of polynomials in the indeterminate t with coefficients in ${{\,\mathrm{{\mathbb {F}}}\,}}_2$. Let $\{{{\,\mathrm{{\mathcal {K}}}\,}}_L\}_{L\in {{\,\mathrm{{\mathbb {R}}}\,}}}$ be a filtration of a simplicial complex. Moreover, define $\textrm{Crit}_L:=\{L\in {{\,\mathrm{{\mathbb {R}}}\,}}\vert \iota ^L_{L-\varepsilon }\ne \textrm{id}\,\forall \varepsilon >0\}$, the set of all L at which ${{\,\mathrm{{\mathcal {K}}}\,}}_L$ changes (which is a finite set at ${{\,\mathrm{{\mathcal {K}}}\,}}$ is finite). Define the function $c:{\mathbb {N}}_0\rightarrow \textrm{Crit}_L\cup \{\inf \textrm{Crit}_L-1\}$ by mapping 0 to $\inf \textrm{Crit}_L-1$ and $n>0$ to the n-th smallest element of $\textrm{Crit}_L$ (without loss of generality, we map integers bigger than the cardinality of $\textrm{Crit}_L$ to the largest element of $\textrm{Crit}_L$).

For a fixed integer k, let $H_k(\,\cdot \,)$ denote the k-th simplicial homology with coefficients in ${{\,\mathrm{{\mathbb {F}}}\,}}_2$. Define

$$\begin{aligned} M_k:=\bigoplus _{n\in {{\,\mathrm{{\mathbb {N}}}\,}}_0}H_k\left( {{\,\mathrm{{\mathcal {K}}}\,}}_{c(n)}\right) \end{aligned}$$

(15)

together with the action of ${{\,\mathrm{{\mathbb {F}}}\,}}_2[t]$ on $M_k$ induced by $t^a\cdot x = \iota _{c(n+a)}^{c(n)}(x)^*\in H_k({{\,\mathrm{{\mathcal {K}}}\,}}_{c(n+a)})$ for $x\in H_k({{\,\mathrm{{\mathcal {K}}}\,}}_{c(n)})$ and non-negative integer a. Then $M_k$ is a (graded) ${{\,\mathrm{{\mathbb {F}}}\,}}_2[t]$-module, called the persistence module of the filtration.

The definition works analogously for a filtration of a topological space (assuming that the homology of the spaces changes at only finitely many filtration values). It can be shown that the operation of taking a persistence module of a filtration of a simplicial complex (or a topological space) is functorial. Hence, persistence modules are algebraic invariants of filtrations.

Since ${{\,\mathrm{{\mathcal {K}}}\,}}$ is finite, the persistence module $M_k$ is finitely generated as a ${{\,\mathrm{{\mathbb {F}}}\,}}_2[t]$-module. As ${{\,\mathrm{{\mathbb {F}}}\,}}_2[t]$ is a principal ideal domain, $M_k$ decomposes into summands generated by a single object uniquely up to (graded) isomorphism and permutation of summands. Hence, we can write

$$\begin{aligned} M_k\cong \left( \bigoplus _{a\in G_F}{{\,\mathrm{{\mathbb {F}}}\,}}_2[t]\right) \oplus \left( \bigoplus _{b\in G_T}{{\,\mathrm{{\mathbb {F}}}\,}}_2[t]/\langle t^{d_b}\rangle \right) , \end{aligned}$$

where $G_F$ is the subset of chosen generators that are free and $G_T$ is the subset of generators that are torsion. In particular, any element in $G_F$ or $G_T$ will have a non-zero entry in exactly one summand of the decomposition in Eq. (15). We call the integer n indexing this entry the degree of that element.

Definition 14

Let $M_k$ be a persistence module that decomposes as above. Let $\textrm{deg}:G_F\cup G_T\rightarrow {{\,\mathrm{{\mathbb {N}}}\,}}_0$ be the function mapping each element to its degree. The barcode of $M_k$ is defined to be the multiset

$$\begin{aligned} {\mathcal {B}}:=\{(c(\textrm{deg}(a)),\infty )\,\vert \,a\in G_F\}\cup \{(c(\textrm{deg}(a)),c(\textrm{deg}(a)+d_a))\,\vert \,a\in G_T\}. \end{aligned}$$

We call the elements of ${\mathcal {B}}$ bars, the first coordinate of each bar its birth-value, the latter coordinate its death-value and the absolute difference of the coordinates its persistence.

A matching of barcodes ${\mathcal {B}}$ and ${\mathcal {B}}'$ is a partial injection $\varpi :{\mathcal {B}}\hookrightarrow {\mathcal {B}}'$. The bottleneck distance between ${\mathcal {B}}$ and ${\mathcal {B}}'$ is defined to be

$$\begin{aligned} d_{BD}\left( {\mathcal {B}},{\mathcal {B}}'\right) :=\inf _\varpi \,\max \left\{ \max _{a\in \textrm{dom}\,\varpi }\left\| a-\varpi (a)\right\| _\infty ,\max _{(x,y)\not \in \textrm{dom}\,\varpi }\frac{y-x}{ 2},\max _{(x,y)\not \in \textrm{im}\,\varpi }\frac{y-x}{ 2}\right\} , \end{aligned}$$

where the infimum is taken over all possible matchings and elements of a barcode are viewed as elements of ${{\,\mathrm{{\mathbb {R}}}\,}}^2$ (we assume $\infty -\infty =0$). Here, $\textrm{dom}\, \varpi $ is the domain of $\varpi $, i.e. the set of inputs at which $\varpi $ is defined.

The bottleneck distance defines a metric on the space of barcodes (Oudot 2015). This metric is stable in the following sense:

Theorem 15

(e.g. Corollary 3.6 in Oudot 2015) Let ${{\,\mathrm{{\mathcal {K}}}\,}}$ be a simplicial complex and let $g,g':{{\,\mathrm{{\mathcal {K}}}\,}}\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ be functions defining filtrations of ${{\,\mathrm{{\mathcal {K}}}\,}}$, and subsequently persistence modules $M_k$ and $M'_k$, and barcodes ${\mathcal {B}}$ and ${\mathcal {B}}'$. Then

$$\begin{aligned} d_{BD}\left( {\mathcal {B}},{\mathcal {B}}'\right) \le \left\| g-g'\right\| _\infty . \end{aligned}$$

Henceforth, we write $\textrm{PH}_k(g)$ to denote the k-dimensional persistent homology (which can equivalently be summarised by a barcode or a persistence module) of a simplicial complex or a topological space filtered by a function g.

5.2.3 Persistent Homology of Random Data

In this section, we study the persistent homology of the posterior distributions of the parameter inferences of Sect. 5.1. Note that simplicial complexes, filtrations and persistent homology can also be employed to compare biological models a priori (i.e. with no dependence on measurement data) (Vittadello and Stumpf 2020).

We demonstrate that filtering a Vietoris–Rips complex for a fixed value L by a function $g'$, as described at the beginning of this section, yields more discriminative power. Here, we pick $g'$ to be an estimated probability density function. These filtrations turn out to be highly discriminative between the mutants and offer novel insight at the biological level. While a Vietoris–Rips filtration is entirely based on distances, the construction we employ, using a Vietoris–Rips complex at a fixed parameter value and then filtering it by a probability density function (pdf), places an emphasis on density. The information encoded is directly related to the probability distribution and the resulting barcodes will stabilise as the sample size increases (Theorem 3.5.1 in Rabadan and Blumberg 2020). Furthermore, the chosen construction is stable with respect to outliers. By contrast, in a Vietoris–Rips filtration, bars in the resulting barcodes will converge towards zero length when increasing the sample size and a single outlier, even in a large sample, can change a barcode drastically.

Initially, we assume that we are given a probability density function $p:{{\,\mathrm{{\mathbb {R}}}\,}}^m\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$. This pdf defines a filtration of the graph by $-p$, ${\mathcal {T}}$ say, via ${\mathcal {T}}_L=\left\{ x\in {{\,\mathrm{{\mathbb {R}}}\,}}^m\,\vert \, -p(x)\le L\right\} $. For $L'\le L$ we then have ${\mathcal {T}}_{L'}\subseteq {\mathcal {T}}_{L}$. Such a filtration is visualised for the case $m=1$ in Fig. 6. By analogy with filtrations of simplicial complexes, we can theoretically compute a barcode for each such topological filtration and investigate the resulting bottleneck distances.

For each (homological) dimension, these barcodes provide a topological signature of a posterior distribution. We point out that although this signature is not a sufficient statistic, it is effective at distinguishing between posteriors corresponding to distinct mutants in our application. In particular, for any pdf $p_1:{{\,\mathrm{{\mathbb {R}}}\,}}^d\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$, the pdf $p_2(x)=p_1(x-x_0)$ gives rise to the same topological signature for any constant $x_0\in {{\,\mathrm{{\mathbb {R}}}\,}}^d$. Thus, rather than comparing the location of probability density in parameter space, in the context of a Bayesian inference, this topological signature captures the quality of the certainty we have in parameter values, irrespective of their location.

For example, bars in the $H_0$-barcode encode the density (as negative of the birth-value) and the prominence (as the persistence) of the modes of a pdf. Similarly, Morse Theory tells us that for a (smooth) pdf on ${{\,\mathrm{{\mathbb {R}}}\,}}^d$, the $(d-1)$th barcode captures local minima by their density (as death-value) and the depth of their basin of attraction (as persistence).

In order to conduct such a topological analysis, two questions must be addressed:

(1)
How can we approximate the topology of a graph of a probability density combinatorially (i.e. in a manner amenable to the application of discrete computational methods) if only point samples are available?
(2)
Can we test the statistical significance of the resulting bottleneck distances?

To resolve the first question, we will employ a result from Bobrowski et al. (2017) that relies on the concept of kernel density estimation (KDE). In order to test the significance of the resulting bottleneck distance, we will use an empirical p-value estimate.

Definition 16

Let ${\textbf{v}}=\{v_1,\dots ,v_N\}\subseteq {{\,\mathrm{{\mathbb {R}}}\,}}^m$ be a set of N samples drawn independently from a probability distribution governed by the density function $p:{{\,\mathrm{{\mathbb {R}}}\,}}^m\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$. Let $K:{{\,\mathrm{{\mathbb {R}}}\,}}^m\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ be smooth, unimodal, symmetric probability density function whose support is contained in the unit ball centred at 0. Then

$$\begin{aligned} {\hat{p}}_b(x)=\frac{1}{ Nb^m}\sum _{i=1}^NK\left( \frac{x-v_i }{ b}\right) \end{aligned}$$

is called a kernel density estimate (KDE) of p with bandwidth b.

On each sample $v_i$, we place a pdf and average it, where b controls the width of each pdf, that is, how much of the probability mass is centred around $v_i$. Loosely speaking, if b is too large, then the resulting function underfits a histogram given by the data, while if it is too small, then the bandwidth overfits the histograms (see Fig. 7). The bandwidth is negatively correlated with the sample size and there are standardised ways of picking optimal bandwidths for the case where p is unknown (Henderson and Parmeter 2012).

Given such an i.i.d. sample ${\textbf{v}}=\{v_1,\dots ,v_N\}\subseteq {{\,\mathrm{{\mathbb {R}}}\,}}^m$ from our probability density function p and an optimal bandwidth b, we can construct a Vietoris–Rips complex with fixed parameter b (equalling the bandwidth)

$$\begin{aligned} \textrm{VR}_b({\textbf{v}}):=\left\{ \{v_0,\dots ,v_k\}\subseteq {\textbf{v}}\,\vert \,\Vert v_i-v_j\Vert \le b\,\forall i,j\right\} . \end{aligned}$$

For the sake of brevity, let ${{\,\mathrm{{\mathcal {K}}}\,}}={\text {VR}}_b({\textbf{v}})$. The KDE ${\hat{p}}_b$ of p based on ${\textbf{v}}$ then extends to a function on ${\mathcal {K}}$ via

$$\begin{aligned} {\hat{p}}_b(\{v_0,\dots ,v_k\}):=\min \left\{ {\hat{p}}_b(v_0),\dots ,{\hat{p}}_b(v_k)\right\} . \end{aligned}$$

In return, the extended function ${\hat{p}}_r$ defines a filtration $\{{{\,\mathrm{{\mathcal {K}}}\,}}_L\}_{L\in {{\,\mathrm{{\mathbb {R}}}\,}}}$ of ${{\,\mathrm{{\mathcal {K}}}\,}}$ by

$$\begin{aligned} {{\,\mathrm{{\mathcal {K}}}\,}}_L:=\left\{ \{v_0,\dots ,v_k\}\,\vert \,-{\hat{p}}_b(\{v_0,\dots ,v_k\})\le L\right\} . \end{aligned}$$

We seek to relate the persistent homology of the filtration of simplicial complexes ${{\,\mathrm{{\mathcal {K}}}\,}}_L$ to the persistent homology of the filtration of topological spaces ${\mathcal {T}}_L$.

In order to use results from Bobrowski et al. (2017), we introduce some notation. For a function $f:{{\,\mathrm{{\mathcal {K}}}\,}}\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ and $\eta >0$ define $f_{\lfloor \eta \rfloor }(\sigma ):=2\eta \lfloor f(\sigma )/(2\eta )\rfloor $. Then

Theorem 17

(Theorem 3.7 in Bobrowski et al. 2017) Let $p:{{\,\mathrm{{\mathbb {R}}}\,}}^m\rightarrow {{\,\mathrm{{\mathbb {R}}}\,}}$ be a smooth bounded pdf with finitely many critical points. Let ${\hat{p}}$ be a KDE with bandwidth b based on n i.i.d samples of p and ${{\,\mathrm{{\mathcal {K}}}\,}}$ be a simplicial complex as above. Assume $b\rightarrow 0$ and $Nb^m\rightarrow \infty $. Then for any $0\le k \le m$, we have

$$\begin{aligned} \textrm{Pr}\left( d_{BD}\left( \textrm{PH}_k(p\right) , \textrm{PH}_k\left( {\hat{p}}_{\lfloor \eta \rfloor }\right) \le 5\eta \right) \ge 1-3\eta ^* Ne^{-C_\eta Nr^d}, \end{aligned}$$

where for $p_{\max }:=\sup _{x\in {{\,\mathrm{{\mathbb {R}}}\,}}^d}p(x)$ we define

$$\begin{aligned} \eta ^*:=\left\lceil p_{\max }/2\eta \right\rceil \quad {\text {and}}\quad C_\eta :=\frac{(\eta /2)^2}{ 3p_{\max }+\eta /2}. \end{aligned}$$

Theoretically, the above theorem can be exploited for testing the null hypothesis ${\textbf{H}}_0:\textrm{PH}_k\left( p\right) =\textrm{PH}_k\left( p'\right) $ for two distributions P and $P'$ with associated densities p and $p'$, as the result enables us to establish a bound on how large a bottleneck distance can be explained by sampling noise at a given significance level. However, we estimate that to use this theorem for showing that the bottleneck distances between posterior distributions associated with the wild-type and the four mutants are significant, we must sample at least $1.5\times 10^7$ points per distribution. This makes persistent homology computation infeasible.

At the same time, we observe that there is little change in the bottleneck distances between the barcodes resulting from the wild-type’s and the four mutants’ posterior distributions when resampling point clouds containing as few as 200 points. This leads us to think that the true p-value associated with the null hypothesis ${\textbf{H}}_0:\textrm{PH}_k\left( p\right) =\textrm{PH}_k\left( p'\right) $, where p and $p'$ are posterior densities corresponding to the wild-type and a mutant is possibly much lower than the upper bound derived by appealing to Theorem 17. One factor that may explain this discrepancy is that while our distributions are technically distributions on ${{\,\mathrm{{\mathbb {R}}}\,}}^3$, they have compact support. Similarly, major sources of instability for KDE, and subsequently for the filtration of density functions, are modes linked to outliers, while repeated simulations suggest that in our case all density functions are unimodal. Together, these aspects imply that the computed barcodes could converge to the barcode obtained by filtering the unknown density function at a faster rate than in the general setting of Theorem 17.

Henceforth, we use the method of constructing a filtration based on a point cloud proposed in Bobrowski et al. (2017), which is provably well-behaved asymptotically but uses a different approach to estimate significance. To do this we opt for a Monte Carlo p-value estimate, also known as the empirical p-value (e.g. see Davison and Hinkley 1997). For each mutant, we sample $\beta $ additional point clouds of size n from the posterior distribution. In this context, for the first mutant (or the wild-type) under investigation, we call the original point cloud ${\textbf{v}}$ and let ${\textbf{v}}_i$ for $i=1,\ldots ,\beta $ denote $\beta $ additional point clouds of size n, obtained by repeated sampling. Define ${\textbf{v}}'$ and ${\textbf{v}}'_i$ analogously for a distinct mutant. Let $d_i=d_{BD}\left( \textrm{PH}_k\left( {\hat{p}}\right) ,\textrm{PH}_k\left( {\hat{p}}_i\right) \right) $, where ${\hat{p}}_i$ is the density estimate obtained from ${\textbf{v}}_i$ and define $d_i'$ analogously. Assume $d=d_{BD}\left( \textrm{PH}_k\left( {\hat{p}}\right) ,\textrm{PH}_k\left( {\hat{p}}'\right) \right) $ is the j-th largest element in the multiset $\left\{ d_i\right\} _{i=1}^\beta \cup \{d\}$ and the $j'$-th largest element in $\left\{ d'_i\right\} _{i=1}^\beta \cup \{d\}$ for two distinct mutants, then

$$\begin{aligned} {\hat{\pi }}=\min \left\{ \frac{\beta +1-j}{ \beta +1},\frac{\beta +1-j' }{ \beta +1}\right\} \end{aligned}$$

is a p-value estimate for a hypothesis test ${\textbf{H}}_0:\textrm{PH}_1\left( p\right) =\textrm{PH}_1\left( p'\right) $. The resulting p-value estimates, for each pair of mutants and wild-type, can be found in Table 1. It is likely that these p-value estimates over-estimate the actual value, but they allow us to reject all null hypotheses at a significance level of 0.05 (North et al. 2002).

Table 1 Bottleneck distance between Barcodes for $H_1$ obtained from super-level-set filtration of KDEs (top) and their respective p-value estimates (bottom)

Full size table

The results (Table 1) of the topological data analysis quantify the differences between the Linear ERK model parameter posteriors for WT and mutants and find SSDD mutant kinetics are most different from WT and other mutants. This biological result raises the suitability for using the SSDD variant as a replacement for wild-type MEK activated by Raf. We suggest this should be investigated with further experimental studies. The previous work by Yeung et al. (2019) found that $\pi $, the processivity parameter, of E203K differed the most from WT MEK. Here we extended and complemented their analysis by comparing the three parameters together as a point cloud.

It remains to address the practical computability of all the constructions involved. As mentioned in the previous section, we use the statistical software STAN (in particular, PyStan) to sample from the posterior distributions. This sampling is approximate via Hamiltonian MCMC (Carpenter et al. 2017), but we can verify via output summaries and trace plots of the Markov chains involved that all chains have converged close to their stationary distribution during the warm-up phase.

In order to construct the KDE, we used the KernelDensity method of the Python package sklearn. We used the Epanechnikov kernel, which satisfies the hypothesis of the kernel in Theorem 17. As a guess for the bandwidth, this package uses a rule of thumb proportional to Silverman’s method, which we then cross-validate and plot against a histogram of our samples for each marginal distribution. Given experimental data, we construct a Vietoris–Rips complex with a radius b, equalling the bandwidth from the KDE, using the Python package dionysus (version 2). We compute the resulting bottleneck distances using the package persim.

6 Conclusion

We presented an exhaustive mathematical analysis that supports the three main findings presented in Yeung et al. (2019): model reduction, analysis of the model parameters and comparing mutation kinetics. Yeung et al. observed that certain values of parameter combinations from the Full ERK model fit the data, which in turn motivated the creation of a reduced model, the Linear ERK model. We confirmed the derivation of the Linear ERK model using algebraic QSS and the validity of the QSS approximation using the QSS variety. We performed systematic identifiability analyses on all three models, which is a prerequisite for meaningful parameter estimation. We found the Full, Rational and Linear ERK models are structurally identifiable. We then improved a previous definition of practical identifiability and showed that the Linear ERK model is practically identifiability but Rational and Full ERK models are not, which is consistent with (Yeung et al. 2019). Hitherto, testing structural identifiability has been limited to small models due to computational costs; however, recent work significantly improves computing structural identifiability, enabling analysis of larger models (Dong et al. 2021; Villaverde et al. 2019). We remark there are many realistic models, such as this ERK study or those by the group of Marisa Eisenberg, that benefit from existing methods and motivate the development of new identifiability tools.

We reproduced the parameter inference for wild-type and mutant MEK experiments. While Yeung et al visually inspected samples of the posteriors, here we quantified these point clouds with computational algebraic topology. In future, it would be interesting to further explore the relationship between topological analysis and practical identifiability and how they may be used to inform experimental design (Apgar et al. 2010; Hagen et al. 2013). Throughout we showcase the potential role of algebra, geometry and topology in systems and synthetic biology. Complementary to the analysis here is an inference of models in systems and single-cell biology that relies on algebra and topology (Wang et al. 2019; Vittadello and Stumpf 2021; Rizvi 2017). We believe that topological data analysis in combination with modelling and parameter estimation is a promising area for the sciences (Thorne et al. 2022; Carriere et al. 2018; Suzuki 2021). We hope our analysis of this ERK case study will motivate other systems biologists to partner with algebraists and topologists to analyse dynamical systems together with their experimental setup and data.

Abbreviations

ODE:: Ordinary differential equation
CRNT:: Chemical reaction network theory
ERK:: Extracellular signal regulated kinase
${S_i}$ :: Concentration of species i
${C_i}$ :: Concentration of compound i
E :: Concentration of free enzyme
${S_{tot}}$ :: Total substrate concentration
${E_{tot}}$ :: Total concentration of enzyme
${k_{j}}$ :: Reaction rate of reaction step j
${\theta }$ :: Tuple of all model parameters
x :: Tuple of all species concentrations in a model
WT:: Wild Type
X :: Tuple of all measurements of species concentrations in a set of experiments
t :: Time
$k_{M_i}$ :: Michalis–Menten constant associated to compound $C_i$
QSSA:: Quasi-steady-state-approximation
$f(x,\theta )$ :: Rates of change of concentrations given x and $\theta $
${\mathcal {V}(f)}$ :: Vanishing ideal of a polynomial f
${Y_\theta ,\, V_\theta }$ :: Parameter/QSSA-variety of $\theta $
${\kappa _i, \pi , \gamma _i}$ :: Parameters of reduced models
${\phi _{t_1,\ldots ,t_l}(\theta )}$ :: The model prediction map at times ${t_1,\ldots ,t_l}$ and parameters ${\theta }$
${\mathbb {K}}$ :: A differentially closed field
${I_\Sigma }$ :: The differential ideal studied to determine structural identifiability
${{{\,\mathrm{{\mathbb {C}}}\,}}(\theta )\{x,y\}}$ :: The differential ring in indeterminates x, y over the fraction field ${{{\,\mathrm{{\mathbb {C}}}\,}}(\theta )}$
${\mathbb {k}}$ :: The subfield of ${\mathbb {K}:=Q({{\,\mathrm{{\mathbb {C}}}\,}}[\theta ]\{x,y\}/I_{\Sigma })}$ generated by the image of ${{{\,\mathrm{{\mathbb {C}}}\,}}\{y\}}$
${U_\delta }$ :: A $\delta $-confidence region
${\Lambda }$ :: Likelihood ratio
${\alpha }$ :: A significance level
r :: Number of replicates
${\delta }$ :: Minimal log-likelihood of a point in parameter space to be included in $\alpha $-confidence region
${\sigma }$ :: Standard deviation of a distribution
TDA:: Topological data analysis
${\Theta }$ :: Parameter space
${\mathcal {X}}$ :: Data space (all measurements from a set of experiments)
${(S_j^*)_{t,i}}$ :: Measurement of species concentration j at time t in trial i
${{{\,\mathrm{{\mathcal {K}}}\,}}}$ :: A simplicial complex
${\tau }$ :: A simplex
${\textbf{v}}$ :: A set of vertices
h :: A simplicial map
g :: A map defining a filtration of a simplicial complex or topological space
${\mathcal {T}}$ :: A topological space
${H_k}$ :: The k-th homology functor
M :: A persistence module
$\varpi $ :: A matching of barcodes
K :: A kernel function
b :: The bandwidth of a kernel
KDE:: Kernel density estimation
${\textrm{VR}_b(\textbf{v})}$ :: A Vietoris–Rips complex on vertices $\textbf{v}$ at resolution b
${\mathcal {B}}$ :: A barcode
${\hat{\pi }}$ :: A p-value estimate

References

Angeli D (2009) A tutorial on chemical reaction network dynamics. Eur J Control 15(3–4):398–406
MathSciNet MATH Google Scholar
Angeli D, Banaji M, Pantea C (2013) Combinatorial approaches to Hopf bifurcations in systems of interacting elements. arXiv preprint arXiv:1301.7076
Aoki K et al (2011) Processive phosphorylation of ERK MAP kinase in mammalian cells. Proc Natl Acad Sci 108(31):12675–12680
Google Scholar
Apgar JF, Witmer DK, White FM, Tidor B (2010) Sloppy models, parameter uncertainty, and the role of experimental design. Mol BioSyst 6(10):1890–1900. https://doi.org/10.1039/b918098b
Article Google Scholar
Audoly S et al (2001) Global identifiability of nonlinear models of biological systems. IEEE Trans Biomed Eng 48(1):55–65
Google Scholar
Banaji M (2020) Building oscillatory chemical reaction networks by adding reversible reactions. SIAM J Appl Math 80(4):1751–1777
MathSciNet MATH Google Scholar
Banaji M, Craciun G et al (2009) Graph-theoretic approaches to injectivity and multiple equilibria in systems of interacting elements. Commun Math Sci 7(4):867–900
MathSciNet MATH Google Scholar
Bar-Even A et al (2011) The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50(21):4402–4410
Google Scholar
Bellu G, Saccomani MP, Audoly S, D’Angio L (2007) DAISY: a new software tool to test global identifiability of biological and physiological systems. Comput Methods Programs Biomed 88(1):52–61
MATH Google Scholar
Bobrowski O, Mukherjee S, Taylor JE (2017) Topological consistency via kernel estimation. Bernoulli 23(1):288–328
MathSciNet MATH Google Scholar
Boulier F, Lefranc M, Lemaire F, Morant P-E (2011) Model reduction of chemical reaction systems using elimination. Math Comput Sci 5(3):289–301
MathSciNet MATH Google Scholar
Carpenter B et al (2017) Stan: a probabilistic programming language. J Stat Softwe 76(1):1–32
Google Scholar
Carriere M, Michel B, Oudot S (2018) Statistical analysis and parameter selection for mapper. J Mach Learn Res 19(1):478–516
MathSciNet MATH Google Scholar
Casella G, Berger RL (2002) Statistical Inference. 2nd. Duxbury, http://statistics.columbian.gwu.edu/sites/statistics.columbian.gwu.edu/files/downloads/Syllabus6202-Spring2013-Li.pdf
Conradi C, Shiu A (2015) A global convergence result for processive multisite phosphorylation systems. Bull Math Biol 77(1):126–155
MathSciNet MATH Google Scholar
Conradi C, Pantea C (2019) Multistationarity in biochemical networks: results, analysis, and examples. In: Algebraic and combinatorial computational biology. Elsevier, pp 279– 317
Conradi C, Obatake N, Shiu A, Tang X (2019) Dynamics of ERK regulation in the processive limit. arXiv preprint arXiv:1910.14452
Craciun G, Feinberg M (2005) Multiple equilibria in complex chemical reaction networks: I. The injectivity property. SIAM J Appl Math 65(5):1526–1546
MathSciNet MATH Google Scholar
Davison AC, Hinkley DV (1997) Bootstrap methods and their applications. Cambridge University Press, Cambridge
MATH Google Scholar
Dickenstein A (2016) Biochemical reaction networks: an invitation for algebraic geometers. In: Mathematical congress of the Americas, vol 656. Contemp. Math., pp 65–83
Domijan M, Kirkilionis M (2009) Bistability and oscillations in chemical reaction networks. J Math Biol 59(4):467–501
MathSciNet MATH Google Scholar
Dong R, Goodbrake C, Harrington HA, Pogudin G (2021) Differential elimination for dynamical models via projections with applications to structural identifiability. arXiv preprint arXiv:2111.00991
Dufresne E, Harrington HA, Raman DV (2018) The geometry of sloppiness. J Algebr Stat 9(1):30–68. https://doi.org/10.18409/jas.v9i1.64
Article MathSciNet MATH Google Scholar
Errami H et al (2015) Detection of Hopf bifurcations in chemical reaction networks using convex coordinates. J Comput Phys 291:279–302
MathSciNet MATH Google Scholar
Fan J, Hung H-N, Wong W-H (2000) Geometric understanding of likelihood ratio statistics. J Am Stat Assoc 95(451):836–841. http://www.jstor.org/stable/2669467
Feliu E, Wiuf C (2012) Preclusion of switch behavior in networks with mass-action kinetics. Appl Math Comput 219(4):1449–1467
MathSciNet MATH Google Scholar
Feliu E, Lax C, Walcher S, Wiuf C (2019) Quasi-steady state and singular perturbation reduction for reaction networks with non-interacting species. arXiv:1908.11270 [math.DS]
Gelman A, Shalizi C (2013) Philosophy and the Practice of Bayesian Statistics. Br J Math Stat Psychol 66(1):8–38
MathSciNet MATH Google Scholar
Goeke A, Walcher S (2014) A constructive approach to quasi-steady state reductions. J Math Chem 52(10):2596–2626. https://doi.org/10.1007/s10910-014-0402-5
Article MathSciNet MATH Google Scholar
Goeke A, Schilli C, Walcher S, Zerz E (2011) Computing quasi-steady state reductions. J Math Chem 50:1495–1513
MathSciNet MATH Google Scholar
Goeke A, Walcher S, Zerz E (2017) Classical quasi-steady state reduction—a mathematical characterization. Physica D 345:11–26
MathSciNet MATH Google Scholar
Grewal M, Glover K (1976) Identifiability of linear and nonlinear dynamical systems. IEEE Trans Autom Control 21(6):833–837
MathSciNet MATH Google Scholar
Gross E, Harrington HA, Rosen Z, Sturmfels B (2016) Algebraic systems biology: a case study for the Wnt pathway. Bull Math Biol 78(1):21–51
MathSciNet MATH Google Scholar
Gunawardena J (2007) Distributivity and processivity in multisite phosphorylation can be distinguished through steady-state invariants. Biophys J 93(11):3828–3834
Google Scholar
Gutenkunst RN et al (2007) Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol 3(10):e189
MathSciNet Google Scholar
Hagen DR, White JK, Tidor B (2013) Convergence in parameters and predictions using computational experimental design. Interface Focus 3(4):20130008
Google Scholar
Harrington HA, Ho KL, Thorne T, Stumpf MP (2012) Parameter-free model discrimination criterion based on steady-state coplanarity. Proc Natl Acad Sci 109(39):15746–15751
Google Scholar
Henderson DJ, Parmeter CF (2012) Normal reference bandwidths for the general order, multivariate kernel density derivative estimator. Stat Probab Lett 82(12):2198–2205
MathSciNet MATH Google Scholar
Hong H, Ovchinnikov A, Pogudin G, Yap C (2019) SIAN: a tool for assessing structural identifiability of parametric ODEs. ACM Commun Comput Algebra 53(2):37–40. https://doi.org/10.1145/3371991.3371993
Article MathSciNet Google Scholar
Hong H, Ovchinnikov A, Pogudin G, Yap C (2020) Global identifiability of differential models. Commun Pure Appl Math 73(9):1831–1879. https://doi.org/10.1002/cpa.21921
Article MathSciNet MATH Google Scholar
Hubert E, Labahn G (2013) Scaling invariants and symmetry reduction of dynamical systems. Found Comput Math 13(4):479–516
MathSciNet MATH Google Scholar
Joubert D, Stigter JD, Molenaar J (2021) Assessing the role of initial conditions in the local structural identifiability of large dynamic models. Sci Rep 11(1):16902. https://doi.org/10.1038/s41598-021-96293-9
Article Google Scholar
Kay SK et al (2017) The role of the Hes1 crosstalk hub in Notch-Wnt interactions of the intestinal crypt. PLoS Comput Biol 13(2):e1005400
Google Scholar
Keener J, Sneyd J (2011) Mathematical physiology. I: cellular physiology, vol 2. Springer, Berlin. https://doi.org/10.1007/978-0-387-75847-3
Book MATH Google Scholar
Klipp E, Liebermeister W, Wierling C, Kowald A (2016) Systems biology: a textbook. Wiley, New York
Google Scholar
Ljung L et al (1987) Theory for the user. In: System identification
Ljung L, Glad T (1994) On global identifiability for arbitrary model parametrizations. Automatica 30(2):265–276
MathSciNet MATH Google Scholar
MacLean AL, Rosen Z, Byrne HM, Harrington HA (2015) Parameter-free methods distinguish Wnt pathway models and guide design of experiments. Proc Natl Acad Sci 112(9):2652–2657
Google Scholar
Manrai AK, Gunawardena J (2008) The geometry of multisite phosphorylation. Biophys J 95(12):5533–5543
Google Scholar
Maplesoft, a division of Waterloo Maple Inc.. Maple. Version 2019. Waterloo (2019). https://hadoop.apache.org
Markevich NI, Hoek JB, Kholodenko BN (2004) Signaling switches and bistability arising from multisite phosphorylation in protein kinase cascades. J Cell Biol 164(3):353–359
Google Scholar
Meshkat N, Eisenberg M, DiStefano JJ III (2009) An algorithm for finding globally identifiable parameter combinations of nonlinear ODE models using Gröbner Bases. Math Biosci 222(2):61–72
MathSciNet MATH Google Scholar
Meshkat N, Kuo CE-z, DiStefano J III (2014) On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and COMBOS: a novel web implementation. PLoS ONE 9(10):e110261
Google Scholar
Millán MP, Dickenstein A, Shiu A, Conradi C (2012) Chemical reaction systems with toric steady states. Bull Math Biol 74(5):1027–1065
MathSciNet MATH Google Scholar
Mincheva M, Roussel MR (2007) Graph-theoretic methods for the analysis of chemical and biochemical networks. I. Multistability and oscillations in ordinary differential equation models. J Math Biol 55(1):61–86
MathSciNet MATH Google Scholar
Müller S et al (2016) Sign conditions for injectivity of generalized polynomial maps with applications to chemical reaction networks and real algebraic geometry. Found Comput Math 16(1):69–97
MathSciNet MATH Google Scholar
Nardini JT et al (2020) Topological data analysis distinguishes parameter regimes in the Anderson–Chaplain model of angiogenesis. arXiv preprint arXiv:2101.00523
North BV, Curtis D, Sham PC (2002) A note on the calculation of empirical P values from Monte Carlo procedures. Am J Hum Genet 71(2):439–441
Google Scholar
Ollivier F (1990) Le problème de l’identifiabilité structurelle globale: étude théorique, méthodes effectives et bornes de complexité. These de Doctorat en Science, Ecole Polytechnique, Paris
Otter N et al (2017) A roadmap for the computation of persistent homology. EPJ Data Sci 6(17):1–38
Google Scholar
Oudot SY (2015) Persistence theory: from quiver representations to data analysis. American Mathematical Society, Providence
MATH Google Scholar
Ovchinnikov A, Pillay A, Pogudin G, Scanlon T (2021) Computing all identifiable functions of parameters for ODE models. Syst Control Lett 157:105030. https://doi.org/10.1016/j.sysconle.2021.105030
Article MathSciNet MATH Google Scholar
Ovchinnikov A, Pogudin G, Thompson P (2021) Parameter identifiability and input–output equations. Appl Algebra Eng Commun Comput. https://doi.org/10.1007/s00200-021-00486-8
Article Google Scholar
Pantea C, Gupta A, Rawlings JB, Cracium G (2014) Discrete and topological models in molecular biology. In: Jonoska N, Saito M (eds) Natural computing series. The QSSA in chemical kinetics: as taught and as practiced. Springer, Cham, pp 419–442
MATH Google Scholar
Pohjanpalo H (1978) System identifiability based on the power series expansion of the solution. Math Biosci 41:21–33
MathSciNet MATH Google Scholar
Qiao L, Nachbar RB, Kevrekidis IG, Shvartsman SY (2007) Bistability and oscillations in the Huang–Ferrell model of MAPK signaling. PLoS Comput Biol 3(9):e184
MathSciNet Google Scholar
Rabadan R, Blumberg AJ (2020) Topological data analysis for genomic and evolution. Cambridge University Press, Cambridge
MATH Google Scholar
Raue A et al (2009) Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 25(15):1923–1929
Google Scholar
Ritt JF (1950) Differential algebra. Dover, New York
MATH Google Scholar
Rizvi AH et al (2017) Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nat Biotechnol 35(6):551–560
Google Scholar
Saccomani MP, Audoly S, D’Angio L (2003) Parameter identifiability of non-linear systems: the role of initial conditions. Automatica 39:619–632
MathSciNet MATH Google Scholar
Segel L (1988) On the validity of the steady state assumption of enzyme kinetics. Bull Math Biol 50:579–593. https://doi.org/10.1007/BF02460092
Article MathSciNet MATH Google Scholar
Seidenberg A (1952) Some basic theorems in differential algebra (characteristic p arbitrary). Trans Am Math Soc 73(1):174–190
MathSciNet MATH Google Scholar
Shaul YD, Seger R (2007) The MEK/ERK cascade: from signaling specificity to diverse functions. Biochim Biophys Acta (BBA)-Mol Cell Res 1773(8):1213–1226
Google Scholar
Shi P, Boukas E-K, Agarwal RK (1999) Kalman filtering for continuous-time uncertain systems with Markovian jumping parameters. IEEE Trans Autom Control 44(8):1592–1597
MathSciNet MATH Google Scholar
Smith RC (2013) Uncertainty quantification: theory, implementation, and applications, vol 12. SIAM, Philadelphia
Google Scholar
Sontag ED (2002) For differential equations with r parameters, 2r$+$ 1 experiments are enough for identification. J Nonlinear Sci 12(6):553–583
MathSciNet MATH Google Scholar
Suzuki A et al (2021) Flow estimation solely from image data through persistent homology analysis. Sci Rep 11(1):1–13
Google Scholar
Sweeney MA (2017) Conditions for solvability in chemical reaction networks at quasi-steady-state. arXiv:1712.05533 [math.DS]
Takahashi K, Tănase-Nicola S, Ten Wolde PR (2010) Spatio-temporal correlations can drastically change the response of a MAPK pathway. Proc Natl Acad Sci 107(6):2473–2478
Google Scholar
Taylor CA et al (2019) Functional divergence caused by mutations in an energetic hotspot in ERK2. Proc Natl Acad Sci 116(31):15514–15523. https://doi.org/10.1073/pnas.1905015116
Article Google Scholar
Taylor CA et al (2019) Functional divergence caused by mutations in an energetic hotspot in ERK2. Proc Natl Acad Sci 116(31):15514–15523. https://doi.org/10.1073/pnas.1905015116
Article Google Scholar
Thomson M, Gunawardena J (2009) The rational parameterisation theorem for multisite posttranslational modification systems. J Theor Biol 261(4):626–636
MATH Google Scholar
Thomson M, Gunawardena J (2009) Unlimited multistability in multisite phosphorylation systems. Nature 460(7252):274–277
Google Scholar
Thorne T, Kirk PD, Harrington HA (2022) Topological approximate Bayesian computation for parameter inference of an angiogenesis model. Bioinformatics 38(9):2529–2535
Google Scholar
Vajda S, Rabitz H, Walter E, Lecourtier Y (1989) Qualitative and quantitative identifiability analysis of nonlinear chemical kinetic models. Chem Eng Commun 83(1):191–219. https://doi.org/10.1080/00986448908940662
Villaverde AF, Evans ND, Chappell MJ, Banga JR (2018) Sufficiently exciting inputs for structurally identifiable systems biology models this project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 686282 (“CANPATHPRO”). In: IFAC-PapersOnLine 51.19 (2018). 7th conference on foundation of systems biology in engineering FOSBE, pp 16–19. https://doi.org/10.1016/j.ifacol.2018.09.015
Villaverde AF, Evans ND, Chappell MJ, Banga JR (2019) Input-dependent structural identifiability of nonlinear systems. IEEE Control Syst Lett 3(2):272–277. https://doi.org/10.1109/LCSYS.2018.2868608
Article MathSciNet Google Scholar
Vittadello ST, Stumpf MPH (2020) Model comparison via simplicial complexes and persistent homology. arXiv:2012.13039 [math.AT]
Vittadello ST, Stumpf MPH (2021) Model comparison via simplicial complexes and persistent homology. R Soc Open Sci 8(10):211361. https://doi.org/10.1098/rsos.211361
Article Google Scholar
Voit E (2017) A first course in systems biology. Garland Science
Wang L, Sontag ED (2008) On the number of steady states in a multiple futile cycle. J Math Biol 57(1):29–52
MathSciNet MATH Google Scholar
Wang S, Lin J-R, Sontag ED, Sorger PK (2019) Inferring reaction network structure from singlecell, multiplex data, using toric systems theory. PLoS Comput Biol 15(12):e1007311
Yeung E et al (2019) Inference of multisite phosphorylation rate constants and their modulation by pathogenic mutations. Curr Biol https://doi.org/10.1016/j.cub.2019.12.052

Download references

Acknowledgements

The authors thank Gleb Pogudin and Stas Shvartsman for helpful discussions. HAH, LM and HMB are grateful to the support provided by the UK Centre for Topological Data Analysis EPSRC Grant EP/R018472/1. HAH gratefully acknowledges funding from EPSRC EP/R005125/1 and EP/T001968/1, the Royal Society RGF$\backslash $EA$\backslash $201074 and UF150238, and Emerson Collective. LM gratefully acknowledges support from the EPSRC Grant EP/R513295/1. HMB and LM also gratefully acknowledge support from the Ludwig Institute for Cancer Research.

Author information

Authors and Affiliations

Mathematical Institute, University of Oxford, Oxford, UK
Lewis Marsh, Helen M. Byrne & Heather A. Harrington
Ludwig Institute for Cancer Research, University of Oxford, Oxford, UK
Lewis Marsh & Helen M. Byrne
Department of Mathematics, The University of York, York, UK
Emilie Dufresne

Authors

Lewis Marsh
View author publications
You can also search for this author in PubMed Google Scholar
Emilie Dufresne
View author publications
You can also search for this author in PubMed Google Scholar
Helen M. Byrne
View author publications
You can also search for this author in PubMed Google Scholar
Heather A. Harrington
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lewis Marsh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 QSSA Model Reduction via the Algebraic Approach

We use the algebraic approach to derive two reduced models.

1.1.1 Deriving the Linear ERK Model

We now substitute E using Eq. (8),

$$\begin{aligned} E=E_{tot}-S_{tot}+S_0+S_1+S_2=: E', \end{aligned}$$

into the Full ERK model (Eqs. (1)).

Setting $\mathrm dC_1/\mathrm dt=0$ and $\mathrm dC_2/\mathrm dt=0$, we get

$$\begin{aligned} C_1&=\frac{k_{f_1}}{ k_{r_1}+k_{c_1}}E'\cdot S_0, \\ C_2&=\frac{1}{ k_{r_2}+k_{c_2}}\left( k_{f_2}E'\cdot S_1+k_{c_1}C_1\right) =\frac{E'}{ k_{r_2}+k_{c_2}}\left( k_{f_2}S_1+\frac{k_{f_1}k_{c_1}}{ k_{r_1}+k_{c_1}}S_0\right) , \end{aligned}$$

which can be substituted into $\mathrm dS_0/\mathrm dt$, $\mathrm dS_1/\mathrm dt$ and $\mathrm dS_2/\mathrm dt$:

$$\begin{aligned} \frac{\mathrm dS_0}{\mathrm dt}&=-\frac{k_{f_1}k_{c_1}}{ k_{r_1}+k_{c_1}}E'\cdot S_0,\\ \frac{\mathrm dS_1}{\mathrm dt}&=-\frac{k_{f_2}k_{c_2}}{ k_{r_2}+k_{c_2}}E'\cdot S_1+\frac{k_{r_2}}{ k_{r_2}+k_{c_2}}\frac{k_{f_1}k_{c_1}}{ k_{r_1}+k_{c_1}}E'\cdot S_0,\\ \frac{\mathrm dS_2}{\mathrm dt}&=\frac{k_{f_2}k_{c_2}}{ k_{r_2}+k_{c_2}}E'\cdot S_1+\frac{k_{c_2}}{ k_{r_2}+k_{c_2}}\frac{k_{f_1}k_{c_1}}{ k_{r_1}+k_{c_1}}E'\cdot S_0. \end{aligned}$$

We rewrite the above equations in terms of the parameter combinations $\kappa _1$, $\kappa _2$ and $\pi $ (see Eq. (11)) then gives the Linear ERK model (Eqs. (10a)–(10c)).

Observe that the Linear ERK model obeys the conservation law $S_0+S_1+S_2={\tilde{S}}_{tot}$. Here, ${\tilde{S}}_{tot}$ is a constant close to $S_{tot}$. Hence, $E'=E_{tot}-(S_{tot}-{\tilde{S}}_{tot})$ is a constant close to $E_{tot}$. The implication that $E'=E_{tot}$ at all times t is physically infeasible. Whether ${\tilde{S}}_{tot}=S_{tot}$ depends on whether one updates the initial conditions for $S_j$ through an inner solution, which is typically assumed in singular-perturbation-theory approaches. The algebraic approach does not require computing an inner solution.

It is important to reiterate at this point that this reduction is not the result of a singular perturbation analysis. Indeed, if we had followed a singular perturbation analysis with substitution (8) for E, we could not factor out $E_{tot}/S_{tot}$ in the non-dimensionalised differential equations for $C_1$ and $C_2$ as nicely as we were able to in the previous subsection. In this instance, we would have to leave factors of $\varepsilon ^{-1}$ in the algebraic equations, which would be ambiguous when taking a limit $\varepsilon \rightarrow 0$.

1.1.2 Deriving the Rational ERK Model

We substitute E using Eq. (7) into the Full ERK model (Eqs. (1)). We then follow the same steps as in the algebraic QSSA model reduction for the Linear model to arrive at the Rational model.

Alternatively, after substituting for E using Eq. (7), we can perform classical QSSA to arrive at the same model reduction (Keener and Sneyd 2011).

1.2 Accuracy of Algebraic QSSA

We start by providing the full statement of Proposition 2 of Goeke et al. (2017) (i.e. the full version of Proposition 4 of this manuscript):

Let $K^*\subset {{\,\mathrm{{\mathbb {R}}}\,}}_+^n\times {{\,\mathrm{{\mathbb {R}}}\,}}_+^m$ satisfy the following:

There exists $({\hat{y}},{\hat{\theta }})$ in the interior of $K^*$ such that $f^{[2]}({\hat{y}},{\hat{\theta }}) = 0$.
$D_2f^{[2]}(x, \theta )$ is invertible for all $(x,\theta )\in K^*$.
There exist $y_0\in {{\,\mathrm{{\mathbb {R}}}\,}}^n$ and $r > 0$ with the following property:

Whenever $(x, \theta )\in K^*$ for some $x\in {{\,\mathrm{{\mathbb {R}}}\,}}^n$ and some $\theta \in {{\,\mathrm{{\mathbb {R}}}\,}}^m_+$ then $\overline{B_r(y_0)}\times \{\theta \}\subseteq K^*$.
There exists an $R > 0$ such that $\Vert f(x,\theta )\Vert \le R$ and $\Vert f_\textrm{red}(x,\theta )\Vert \le R$ for all $(x,\theta )\in K^*$.
There exists an $L > 0$ such that $\Vert Df(x,\theta )\Vert \le L$ and $\Vert Df_\textrm{red}(x,\theta )\Vert \le L$ for all $(x,\theta )\in K^*$.

These conditions imply that every $V_{\theta ^*}$, with ${\theta ^*}$ near ${\hat{\theta }}$, is a submanifold. Note that every $(y, \theta ^*)$ with y in the interior of ${{\,\mathrm{{\mathbb {R}}}\,}}^n_+$ is contained in some $K^*$ that satisfies the last three of the above conditions.

Proposition 18

Assume that the above conditions are satisfied for $K^*$.

(a)
Let $\theta ^*$ be given such that $V_{\theta ^*}\times \{\theta ^*\}$ has non-empty intersection with $\textrm{int}\, K^*$, let $(y, \theta ^*)$ be a point in this intersection and $V'_{\theta ^*}\subset {{\,\mathrm{{\mathbb {R}}}\,}}^n$ be some open neighbourhood of y such that $(V_{\theta ^*}\cap V'_{\theta ^*})\times \{\theta ^*\}\subset K^*$. Moreover let $t^* > 0$ such that the solution of (4) with initial value y exists and remains in $V'_{\theta ^*}$ for all $t\in [0, t^*]$. Then there exists a compact neighbourhood $ A_{\theta ^*}\subseteq V'_{\theta ^*}$ of y with the following properties: (i) For every $z\in A_{\theta ^*}$ the solution of (4) with initial value z exists and remains in $V'_{\theta ^*}$ for all $t\in [0, t^*]$. (ii) For every $\varepsilon ' > 0$ there is a $\delta _1 > 0$ such that the solution of (6) with initial value $z\in A_{\theta ^*}\cap V_{\theta ^*}$ exists and remains in $V_{\theta ^*}$ for $t\in [0,t^*]$ whenever $\Vert f-f_\textrm{red}\Vert < \delta _1$ on $V'_{\theta ^*}$. (iii) For every $\varepsilon ' > 0$ there is a $\delta \in (0, \delta _1]$ such that the difference of the solutions of (4) resp. of (6) with initial value $z\in A_{\theta ^*}\cap V_{\theta ^*}$ has norm less than $\varepsilon '$ for all $t\in [0, t^*]$ whenever $\Vert f-f_\textrm{red}\Vert < \delta $ on $V'_{\theta ^*}$.
(b)
Let $y\in V_{\theta ^*}$ and let $\rho _0 > 0$ such that
$$\begin{aligned} \overline{B_{\rho _0/2L}(y)} \times \{{\theta ^*}\}\subseteq K^*. \end{aligned}$$
Let $\rho \le \rho _0$ such that $\Vert f(y,{\theta ^*})-f_\textrm{red}(y,{\theta ^*})\Vert \ge 2\rho $. Then for $t':= \rho /(2LR)$ the solutions of (4) resp. of (6) with initial value y exist and remain in $\overline{B_{\rho _0/2L}(y)}$ for $0\le t\le t'$, and their difference has norm at least $\rho ^2/(2LR)$ at $t = t'$.

Returning to our two model reductions of interest, we use the notation of Sects. 2 and 3 and label the two possible substitutions for the variable E as (7) and (8), as before. Moreover, we write $\dot{S_0}$ and $\dot{S_1}$ as shorthand for the algebraic expressions for the time derivatives of $S_0$ and $S_1$ respectively. Recall that $K_i:=k_{c_i}+k_{r_i}$. Throughout, we will view our set of polynomials f, which govern the ODE system, as a smooth function from ${{\,\mathrm{{\mathbb {R}}}\,}}^5$ to ${{\,\mathrm{{\mathbb {R}}}\,}}$. Any norm in this subsection will refer to the $\Vert \,\cdot \,\Vert _\infty $-norm on the respective space.

Define a domain $K^*$ such that $k_{f_i}\ll K_i$ for $i=1,2$ and such that it satisfies all hypotheses in the list at the start of this section (i.e. from Proposition 2 of Goeke et al. 2017).

Note that for both substitutions, E is bounded above by $E_{tot}\approx 0.65\mu E$. Hence, the norms on the slow variables are going to be approximately the same for both substitutions.

For the case (7), we get that $\Vert f_\textrm{red}\Vert $ (showing only the components that have changed compared to f) is given by

$$\begin{aligned} \left\| \begin{array}{c} k_{f_1}E(S_0-\det ^{-1}\dot{S_0}(k_{f_2}S_1+K_2)+\dot{S_1}k_{f_2}S_0)-K_1C_1\\ k_{f_2}E(S_1-\det ^{-1}(k_{f_1}(\dot{S_1}S_0+\dot{S_0}S_1))+\det ^{-1}E(\dot{S_0}k_{f_1}k_{c_2}-\dot{S_1}k_{f_2})+k_{c_1}C_1-K_2C_2 \end{array}\right\| \end{aligned}$$

on variables $C_1$ and $C_2$. Here, the quantity $\det $ is given by

$$\begin{aligned} \det =k_{f_1}(k_{c_1}+K_2)S_0+K_1(k_{f_2}S_1+K_2)={\mathcal {O}}(K_1K_2). \end{aligned}$$

Similarly, for (8), we get

$$\begin{aligned} \left\| \begin{array}{c} k_{f_1}\left( ES_0-\frac{E+S_0}{ K_1}\dot{S_0}\right) - K_1C_1+\frac{k_{c_1}k_{f_2}}{ K_1K_2}(E+S_1)\dot{S_1}\\ k_{f_2}\left( ES_1-\frac{E+S_1}{ K_2}\dot{S_1}\right) +k_{c_1}C_1-K_2C_2 \end{array}\right\| . \end{aligned}$$

In both cases, applying the triangle inequality shows that the upper entry is of order $|K_1C_1|$ while the lower entry is of order $|k_{c_1}C_1-K_2C_2|$. It follows that the quantity R, as in the assumptions of Proposition 18, is of a similar order for both substitutions.

For $\Vert Df_\textrm{red}\Vert $, we get

$$\begin{aligned} \left\| \begin{array}{c} k_{f_1}E\left( 1+\frac{k_{f_1}(\det +\dot{S_0}(k_{c_1}+K_2)}{\det ^2}(k_{f_2}S_1+K_2)+k_{f_2}\dot{S_1}\right) \\ -k_{f_2}^2S_0 \\ -k_{f_1}\left( S_0-\frac{\dot{S_0}(k_{f_2}S_1+K_2)}{\det }+\dot{S_1}k_{f_2}S_0-E\frac{k_{r_1}(k_{f_2}S_1+K_2)}{\det }\right) -K_1 \\ 0 \end{array}\right\| ^{T} \end{aligned}$$

for the row of the Jacobian corresponding to the entry of $f_\textrm{red}$ giving the rate of change of $C_1$ and

$$\begin{aligned} \left\| \begin{array}{c} Ek_{f_1}\frac{k_{f_2}((k_{f_1}S_1-\dot{S_1})+(\dot{S_1}S_0+\dot{S_0}S_1)k_{f_1}(k_{c_1}+K_2))+k_{f_1}k_{c_2}\det +(\hat{S_1}k_{f_2}-k_{f_1}k_{c_2}\hat{S_0})(k_{c_1}+K_2)}{\det ^2}\\ Ek_{f_2}\frac{\det ^2+k_{f_1}k_{f_2}S_0\det +k_{f_1}k_{f_2}(\dot{S_1}S_0+\dot{S_0}S_1)K_1+k_{f_2}\det -(\dot{S_0}k_{f_1}k_{c_2}-\dot{S_1}k_{f_2})K_1}{\det ^2}\\ -\frac{k_{f_2}k_{r_1}ES_1}{\det }+\frac{k_{f_1}k_{c_2}k_{r_1}E}{\det }+k_{c_1}\\ -\frac{k_{f_2}k_{r_2}ES_0}{\det }+E\frac{k_{f_2}k_{r_2}}{\det }+K_2 \end{array}\right\| ^{T} \end{aligned}$$

for the row of the Jacobian corresponding to the entry of $f_\textrm{red}$ giving the rate of change of $C_2$ for the substitution (7). Here, the four columns (or rows in the given transposed presentation) correspond to partial derivatives with respect to $S_0$, $S_1$, $C_1$ and $C_2$ (in that order). Note that the rows of $Df_\textrm{red}$ corresponding to the rates of change of $S_0$, $S_1$ and $S_2$ are identical for both substitutions. Thus, we omit their calculations.

Similarly, for (8)

$$\begin{aligned} \left\| \begin{array}{cc} k_{f_1}\left( S_0+E-\frac{2\dot{S_0}-k_{f_1}(E+S_0)}{ K_1}\right) +\frac{k_{c_1}k_{f_2}}{ K_1K_2}\dot{S_1} &{} k_{f_1}\left( S_1-\frac{\dot{S_1}}{ K_2}\right) \\ k_{f_1}\left( S_0-\frac{\dot{S_0}}{ K_1}\right) -2k_{f_2}\frac{k_{c_1}k_{f_2}}{ K_1K_2} &{} k_{f_1}\left( E+S_1-\frac{2\dot{S_1}-k_{f_1}(E+S_1)}{ K_2}\right) \\ -k_{f_1}\frac{\dot{S_0}+k_{r_1}(E+S_0)}{ K_1}-K_1 &{} -k_{f_2}\frac{\dot{S_1}+k_{r_2}(E+S_1)}{ K_2}+k_{c_1} \\ 0 &{} K_2 \end{array}\right\| ^{T}. \end{aligned}$$

We note that the dominant terms are exactly $K_1$, $k_{c_1}$ and $K_2$ for both substitutions. We conclude that the quantity L, as in the assumptions of Proposition 18, is of similar order for both substitutions.

For the quantity $\Vert f-f_\textrm{red}\Vert $, for substitution (7), we get

$$\begin{aligned} \left\| \begin{array}{c} k_{f_1}E(\dot{S_1}k_{f_2}S_0-\det ^{-1}\dot{S_0}(k_{f_2}S_1+K_2))\\ \det ^{-1}k_{f_1}k_{c_2}E\dot{S_0}-\det ^{-1}k_{f_1}k_{f_2}E(\dot{S_0}S_1+\dot{S_1}S_0)-\det ^{-1}k_{f_2}K_1E\dot{S_1} \end{array}\right\| . \end{aligned}$$

Similarly, for (8) we get

$$\begin{aligned} \left\| \begin{array}{c} \frac{k_{c_1}k_{f_1}}{ K_1K_2}(E+S_1)\dot{S_1}-k_{f_1}\frac{E+S_0}{ K_1}\dot{S_0}\\ -k_{f_2}\frac{E+S_1}{ K_2}\dot{S_1} \end{array}\right\| . \end{aligned}$$

Both of the above norms are ${\mathcal {O}}(1)$ (as $\dot{S_0}$ and $\dot{S_1}$ contain terms $K_1$ and $K_2$ respectively).

Hence, all quantities use to bound accuracy in Proposition 18 (above, by part (a), or below, by part (b)) are of a similar order. We conclude that the accuracy of both reductions is approximately the same when assuming $0\le k_{f_i}\ll K_i$. This follows from the measurements $k_{M_i}\approx 25\,\mu M$ (Bar-Even 2011). Moreover, we have seen previously (in the main text) that $K_i>0$ for $i=1,2$ is a sufficient condition for a parameter value $\theta $ to be a QSS-parameter value in the sense of Goeke et al. (2017).

1.3 Deriving Equality of Ideals

In this section of the appendix, we derive that $I({\mathcal {V}}_0)=I_\Sigma $ for the ERK models with given initial condition. In Saccomani et al. (2003), ${\mathcal {V}}_0$ is defined as all trajectories of a given ODE system with initial condition $x_0$. That is, if x(t) satisfies $S(0)=x_0$ and ${\dot{S}}=f(S, \theta )$, then ${\mathcal {V}}_0=\{x(t)\,\vert \,t\ge 0\}\subset {{\,\mathrm{{\mathbb {C}}}\,}}(\theta )^n$. Then $I({\mathcal {V}}_0)$ is defined to be the ideal of all polynomials in ${{\,\mathrm{{\mathbb {C}}}\,}}(\theta )[S]$ vanishing on ${\mathcal {V}}_0$.

Proposition 19

Assume we are given a complex-valued ODE model, including an initial condition. Let $\theta $ denote the set of its parameters and S the set of its variables. Consider the affine space ${\mathcal {A}}$ given by its variables and their derivative terms (of all orders), viewed as an affine space over the fraction field ${\mathbb {C}}(\theta )$. Define ${\mathcal {V}}_0$ to be the manifold in ${\mathcal {A}}$ given by the ODE trajectories under the given initial condition.

Then for the Full, Rational and Linear ODE model considered in this manuscript, we have that $I({\mathcal {V}}_0)$, the differential ideal of all polynomials in the differential ring ${\mathbb {C}}(\theta )[S]$ vanishing on ${\mathcal {V}}_0$ (c.f. Saccomani et al. 2003), is exactly $I_\Sigma $.

Proof

By definition, $I({\mathcal {V}}_0)\supset I_\Sigma $. Hence, we need only show the inclusion $I_\Sigma \subset I({\mathcal {V}}_0)$. By way of contradiction, suppose there are elements of $I({\mathcal {V}}_0)$ which are not elements of $I_\Sigma $. Without loss of generality, we may assume that such polynomials contain no derivatives of variables. Indeed, starting with any polynomial in $I({\mathcal {V}}_0)$, we can use the differential equations in $I_\Sigma $ to cancel out all terms containing derivatives of variables and so obtain an element of $I({\mathcal {V}}_0)$ without such terms.

We need to take some extra care for the Rational ERK model: strictly speaking, the polynomial $A(\gamma _1S_0+\gamma _2S_1+1)-1$ is in $I({\mathcal {V}}_0)$ but not in $I_\Sigma $. However, we know this relation a priori and it is merely a result of us converting a rational ODE model into a polynomial one. In particular, it is independent of an initial condition. Hence, in the context of this section, we will assume that the Rational ERK model contains an output variable $y_3:=A(\gamma _1S_0+\gamma _2S_1+1)$ (in addition to $y_1=S_0$ and $y_2=S_1$). Whilst $y_3$ adds polynomials to $I_\Sigma $, we will omit it from any polynomials we study henceforth as it is equivalent to the constant 1 (given the definition of A).

For the Rational ERK model, assume there exists a nonzero $p\in {\mathbb {C}}(\theta )[S_0, S_1, A]$ such that $p(S_0, S_1, A)=0$ for all $t\ge 0$ given our initial condition. By using $A(\gamma _1S_0+\gamma _2S_1+1)=1$, we can turn p into a rational function in variables $S_0, S_1$ which vanishes on the ODE trajectories. Hence, there must exist a polynomial $q\in {\mathbb {C}}(\theta )[S_0, S_1]$ such that $q(S_0, S_1)=0$ for all $t\ge 0$ given our initial condition. Any properties we will use in the following argument are satisfied by both the Rational and Linear ERK models unless stated otherwise. As we need to disprove the existence of such polynomial q when studying the Linear ERK model, the following conclusion will hold for both the Linear and Rational ERK models.

Without loss of generality, assume that q is a polynomial of the smallest non-zero degree satisfying $q(S_0, S_1)=0$ on the ODE trajectories and that q is irreducible. If it were reducible, simply replace q with an irreducible factor of q such that $q(S_0,S_1)=0$ for all $0\le t<t_1$ for some $t_1>0$. The following argument will still hold.

The Linear and Rational ERK models have thus far been presented as an ODE in ${{\,\mathrm{{\mathbb {R}}}\,}}^2$ in the basis $S_0$, $S_1$. At generic parameter values (i.e. if $\kappa _1\ne \kappa _2$), we may change our basis to $X_0=S_1$, $X_1=\kappa _1(1-\pi )/(\kappa _1-\kappa _2)S_0+S_1$. We then find $\dot{X_0}=-\kappa _1X_0$ and $\dot{X_1}=-\kappa _2X_1$ for the Linear ERK model and $\dot{X_0}=-\kappa _1AX_0$ and $\dot{X_1}=-\kappa _2AX_1$ for the Rational ERK model.

As this change of basis is an invertible affine transformation, we may assume that q is a polynomial in variables $X_0$ and $X_1$. We note that this diagonalisation of the ODE models would not have been possible if we had not removed $S_2$ from our systems, as $S_2$ decouples.

We will write

$$\begin{aligned} (X_0, X_1)=\sum _{i,j\ge 0}a_{ij}X_0^iX_1^j=0. \end{aligned}$$

Taking the derivative of this equation with respect to t gives

$$\begin{aligned} \sum _{i,j\ge 0}a_{ij}(i\dot{X_0}X_0^{i-1}X_1^j+j\dot{X_1}X_0^iX_1^{j-1})=0, \end{aligned}$$

which, after substitution for the derivative variables, yields

$$\begin{aligned} q'(X_0,X_1):=\sum _{i,j\ge 0}a_{ij}(i\kappa _1+j\kappa _2)X_0^iX_1^j=0 \end{aligned}$$

(for the Rational ERK model, we need to divide by A to obtain such $q'$).

Note that, given our initial condition, both $X_0$ and $X_1$ vary, hence the intersection of V(q) and $V(q')$ must contain a smooth point, as both contain the ODE trajectory. This would imply that $q'$ is a constant multiple of q, which is not true at generic parameter values. Hence, no such q can exist.

For the Full ERK model, assume that there is a non-zero polynomial $p\in {\mathbb {C}}(\theta )[S_0, S_1, C_1, C_2]$ such that $p(S_0, S_1, C_1, C_2)=0$ for all $t\ge 0$ given our initial condition. Let p have degree n. Again, we may assume without loss of generality that n is minimal. We know that there are no conserved quantities in the Full ERK model (after removing E and $S_2$) and thus $n\ge 2$. Let

$$\begin{aligned} p(S_0, S_1,C_1,C_2)=\sum _{i,j,k,l\ge 0}a_{ijkl}S_0^iS_1^jC_1^kC_2^l=0. \end{aligned}$$

By taking the derivative with respect to t, we get

$$\begin{aligned}{} & {} \sum _{i,j,k,l\ge 0}a_{ijkl}(i\dot{S_0}S_0^{i-1}S_1^jC_1^kC_2^l+j\dot{S_1}S_0^iS_1^{j-1}C_1^kC_2^l\\{} & {} \quad +k\dot{C_1}S_0^iS_1^{j}C_1^{k-1}C_2^l+l\dot{C_2}S_0^iS_1^{j}C_1^kC_2^{l-1})=0. \end{aligned}$$

For $S_0$, this rearranges to

$$\begin{aligned} \dot{S_0}=-\frac{\sum _{i,j,k,l\ge 0}a_{ijkl}(j\dot{S_1}S_0^iS_1^{j-1}C_1^kC_2^l+k\dot{C_1}S_0^iS_1^{j}C_1^{k-1}C_2^{l}+l\dot{C_2}S_0^iS_1^{j}C_1^kC_2^{l-1})}{\sum _{i,j,k,l\ge 0}a_{ijkl}iS_0^{i-1}S_1^{j}C_1^{k}C_2^l}, \end{aligned}$$

(assuming there are $S_0$ terms in p) and we can derive similar expressions for the other variables. Note that the denominator is of a smaller degree than p, hence it will be non-zero at almost all points of our ODE trajectory. In addition, for our given initial condition, all of our variables vary and hence the numerator must be a non-zero polynomial and must not vanish on the ODE trajectory. Denote $i^*$ the highest power of $S_0$ in p. Then $a_{i^*jkl}\ne 0$ implies $k=0$, as otherwise the highest power of $S_0$ in the numerator is $i^*+1$ and $i^*-1$ in the denominator, implying that we can find an $S_0^2$-term in $\dot{S_0}$, a contradiction. We can derive similar statements for $S_1$, $C_1$, and $C_2$. Then, in the above equation, the highest power of $C_1$ in the numerator is $k^*+1$, while the highest power in the denominator is $k^*-1$ (at generic parameter values), implying that $\dot{S_0}$ is quadratic in $C_1$, a contradiction. If p does not contain terms in $S_0$, we can apply a similar argument to $S_1$, $C_1$ or $C_2$.

In conclusion, no such p can exist. $\square $

1.4 The Linear ERK Model is Globally Structurally Identifiable

Proposition 20

For any choice of three time points $0<t_1<t_2<t_3$ the model prediction map $\phi _{t_1,t_2,t_3}$ is injective and so the Linear ERK model is globally structurally identifiable.

Proof

For a parameter $(\kappa _1,\kappa _2,\pi )$, we denote the analytic solution in the Linear ERK model for species i at time t by ${S}_i(t)$ as in Sect. 4.4. Then the model prediction map $\phi _{t_1,t_2,t_3}$ is given by

$$\begin{aligned} (\kappa _1,\kappa _2,\pi )\mapsto ({S}_0(t_1),{S}_1(t_1),{S}_2(t_1),{S}_0(t_2),{S}_1(t_2),{S}_2(t_2),{S}_0(t_3),{S}_1(t_3),{S}_2(t_3)) \end{aligned}$$

Suppose that $(\kappa _1,\kappa _2,\pi )$ and $(\kappa _1',\kappa _2',\pi ')$ are two parameters such that

$$\begin{aligned} \phi _{t_1,t_2,t_3}(\kappa _1,\kappa _2,\pi )=\phi _{t_1,t_2,t_3}(\kappa _1',\kappa _2',\pi '). \end{aligned}$$

Looking at the first component we find that $5e^{-\kappa _1t_1}=5e^{-\kappa _1't_1}$, and since $t_1\ne 0$ it follows that $\kappa _1=\kappa _1'$.

There are three cases to consider. The first case is if $\kappa _2=\kappa _1=\kappa _2'$. In this case, looking at the second component, we get that $5\kappa _1(1-\pi )t_1e^{-\kappa _1t_1}=5\kappa _1(1-\pi ')t_1e^{-\kappa _1t_1}$, and so $\pi =\pi '$, since $t_1\ne 0$.

Without loss of generality, the second case is if $\kappa _2=\kappa _1\ne \kappa _2'$. Suppose for a contradiction that $(\kappa _1,\kappa _1,\pi )\ne (\kappa _1,\kappa _2',\pi ')$. Let ${S}_1(t)$ be the analytic solution of the Linear ERK model for species 1 at time t with parameter $(\kappa _1, \kappa _1,\pi )$ and ${S}'_1(t)$ be the analytic solution at time t with parameter $(\kappa _1,\kappa '_2,\pi ')$. For the moment, consider $t>0$ to be a variable. Then ${S}_1(t)={S'}_1(t)$ is equivalent to

$$\begin{aligned} (1-\pi )te^{-\kappa _1t}=\frac{1-\pi '}{\kappa _1-\kappa _2'}\left( e^{-\kappa '_2t}-e^{-\kappa _1t}\right) . \end{aligned}$$

(16)

Dividing both sides by $e^{-\kappa _1t}$ and rearranging the above gives

$$\begin{aligned} (1-\pi )t-\frac{1-\pi '}{\kappa _1-\kappa _2'}e^{-(\kappa _2'-\kappa _1)t}=-\frac{1-\pi '}{\kappa _1-\kappa _2'} \end{aligned}$$

Taking the derivative with respect to t yields

$$\begin{aligned} (1-\pi )=(1-\pi ') e^{-(\kappa _2'-\kappa _1)t}. \end{aligned}$$

Rearranging and taking a $\log $ then gives

$$\begin{aligned} t=\frac{1}{\kappa _1-\kappa _2'}\log \left( \frac{1-\pi }{1-\pi '}\right) . \end{aligned}$$

As the derivative of Eq. (16) has exactly one solution in t, Eq. (16) has at most two solutions in t by Rolle’s theorem. Therefore, the second, fifth and eighth components of $\phi _{t_1,t_2,t_3}$ cannot take the same value. This is a contradiction, and so we should have $(\kappa _1,\kappa _1,\pi )=(\kappa _1,\kappa _2',\pi ')$, meaning that this case is simply not possible.

We now consider the third and final case, $\kappa _2\ne \kappa _1\ne \kappa _2'$. Suppose for a contradiction that $(\kappa _1,\kappa _2,\pi )\ne (\kappa _1,\kappa _2',\pi ')$. Let ${S}_1(t)$ be the analytic solution of the Linear ERK model for species 1 at time t with parameter $(\kappa _1, \kappa _2,\pi )$ and ${S'}_1(t)$ be the analytic solution at time t with parameter $(\kappa _1,\kappa '_2,\pi ')$. For the moment, consider $t>0$ to be a variable. Then ${S}_1(t)={S}'_1(t)$ is equivalent to

$$\begin{aligned} \frac{1-\pi }{\kappa _1-\kappa _2}\left( e^{-\kappa _2t}-e^{-\kappa _1t}\right) =\frac{1-\pi '}{\kappa _1-\kappa _2'}\left( e^{-\kappa '_2t}-e^{-\kappa _1t}\right) . \end{aligned}$$

(17)

Dividing both sides by $e^{-\kappa _1t}$ and rearranging the above gives

$$\begin{aligned} \frac{1-\pi }{\kappa _1-\kappa _2}e^{-(\kappa _2-\kappa _1)t}-\frac{1-\pi '}{\kappa _1-\kappa '_2}e^{-(\kappa '_2-\kappa _1)t}=\frac{1-\pi }{\kappa _1-\kappa _2}-\frac{1-\pi '}{\kappa _1-\kappa '_2}. \end{aligned}$$

Taking the derivative with respect to t yields

$$\begin{aligned} (1-\pi )e^{-(\kappa _2-\kappa _1)t}-(1-\pi ')e^{-(\kappa '_2-\kappa _1)t}=0. \end{aligned}$$

Taking a $\log $ and rearranging gives

$$\begin{aligned} t=\frac{1}{\kappa _1-\kappa _2'}\log \left( \frac{1-\pi }{1-\pi '}\right) . \end{aligned}$$

As the derivative of Eq. (17) has exactly one solution in t, Eq. (17) has at most two solutions in t by Rolle’s theorem. Therefore, the second, fifth and eighth components of $\phi _{t_1,t_2,t_3}$ cannot take the same value. This is a contradiction, and so we must have $(\kappa _1,\kappa _2,\pi )=(\kappa _1,\kappa _2',\pi ')$. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Marsh, L., Dufresne, E., Byrne, H.M. et al. Algebra, Geometry and Topology of ERK Kinetics. Bull Math Biol 84, 137 (2022). https://doi.org/10.1007/s11538-022-01088-2

Download citation

Received: 27 February 2022
Accepted: 16 September 2022
Published: 23 October 2022
DOI: https://doi.org/10.1007/s11538-022-01088-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Algebra, Geometry and Topology of ERK Kinetics

Abstract

Similar content being viewed by others

Dynamical analysis of the fission yeast cell cycle via Markov chain

Models in biology: lessons from modeling regulation of the eukaryotic cell cycle

Spatio-temporal dynamics of a cell signal pathway with negative feedbacks: the MAPK/ERK pathway

1 Introduction

Theorem 1

Biological Result

2 From ERK Biochemical Reactions to a Polynomial Dynamical System

2.1 The Model

2.2 The Data

2.2.1 Experimental Setup and Data

2.2.2 Known Biological Information

3 Algebraic Model Reduction

3.1 Notation for Model Reduction

Remark

3.2 The Algebraic QSSA Approach

Remark

Definition 2

Remark

Proposition 3

Proposition 4

3.3 Reducing the ERK Model Algebraically

3.3.1 Reduction via Conservation Laws

3.3.2 Reduction via an Algebraic QSSA

3.3.3 Assessing Accuracy

3.3.4 Choice of Output Variables

4 Identifiability

4.1 Structural Identifiability

Definition 5

Open Problem 6

Remark

Proposition 7

Remark

Remark

4.2 Practical Identifiability

Definition 8

4.3 Algorithm for Testing Practical Identifiability

Remark

4.4 The Practical Identifiability of the Linear ERK Model

5 Topological Data Analysis for Kinetic Parameter Inference

5.1 Bayesian Inference

5.2 Topological Analysis

Definition 9

5.2.1 Homology

Definition 10

Definition 11

5.2.2 Persistence

Definition 12

Definition 13

Definition 14

Theorem 15

5.2.3 Persistent Homology of Random Data

Definition 16

Theorem 17

6 Conclusion

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 QSSA Model Reduction via the Algebraic Approach

1.1.1 Deriving the Linear ERK Model

1.1.2 Deriving the Rational ERK Model

1.2 Accuracy of Algebraic QSSA

Proposition 18

1.3 Deriving Equality of Ideals

Proposition 19

Proof

1.4 The Linear ERK Model is Globally Structurally Identifiable

Proposition 20

Proof

Rights and permissions

About this article