1 Introduction

mRNA transfection is the process of introducing mRNA into a living cell. mRNA delivery has become increasingly interesting for biomedical applications because it enables treatment of diseases by means of targeted expression of proteins and it is transient, avoiding the risk of permanently integrating into the genome (see e. g. Sahin et al. 2014). The most prominent applications of mRNA transfection are the mRNA-based vaccines that are successfully used to fight the COVID-19 pandemic (DeFrancesco 2020; Borah et al. 2021). These successful use cases demonstrate the importance of a precise understanding of the dynamics of the underlying processes.

We aim at facilitating further insights into the determinants of the mRNA delivery process and the translation kinetics through the use of mechanistic modeling and parameter inference for such models from experimental data. Therefore, we consider data from an mRNA transfection experiment using fluorescence reporters and fluorescence microscopy which is one of the few ways to measure quantities within a living cell over time (i. e. keeping it alive is necessary). Due to the discrete nature of the molecular species within a cell and due to the fact that random fluctuations play a key role (Elowitz et al. 2002; Raj and van Oudenaarden 2008), a continuous-time, discrete-space Markov process, also called a Markov jump process (MJP), for which the dynamics are described by the so-called chemical master equation (CME), is widely accepted to be an appropriate stochastic description of the biochemical processes within a cell (Schnoerr et al. 2017; Gillespie 1992). However, parameter inference for MJPs is computationally very demanding and often infeasible (see e. g. Warne et al. 2019). Therefore, several other representations of the biochemical kinetics have been developed. Higham (2008) provides an accessible introduction to those. To some extent they can be considered as approximations to the corresponding MJP. The most commonly used representation is the reaction rate equation (RRE) which is a system of ordinary differential equation (ODEs) and thus provides a deterministic and state-continuous description of the kinetics. One approach that preserves the stochastic nature of the underlying process is the approximation by Itô diffusion processes. These are continuous-time, continuous-space stochastic processes described by Itô-type stochastic differential equation (SDEs).

Single-cell fluorescence data from transfection experiments has been analyzed based on ODE modeling in several previous studies, e. g. in Ligon et al. (2014), Leonhardt et al. (2014), Fröhlich et al. (2018), and Reiser et al. (2019). In all of these studies, several parameters of the considered ODE models for the translation kinetics after mRNA transfection were not identifiable from the experimental data. Moreover, the quality of the parameter estimates, i. e. whether the true kinetic rates are adequately captured, is unclear. Using models that explicitly account for stochasticity can help improve our ability to determine kinetic parameters from experimental data (Munsky et al. 2009). There are two important notions with respect to parameter identifiability (Raue et al. 2009): Structural identifiability exclusively considers the structure of the model including the model of the observations and answers the question whether the parameters can be uniquely determined if we are given perfect data, i. e. an infinite amount of data observed without measurement error and continuously in time. Whereas practical identifiability is concerned with the question whether the parameters can be determined from a specific data set (which is always finite and usually subject to measurement error). Browning et al. (2020) have recently compared parameter identifiability for ODE and SDE modeling approaches for example models from different contexts based on simulated data and showed that SDE modeling does improve the identifiability for the studied models.

Here, we formulate an SDE model of the translation kinetics after mRNA transfection and compare its parameter identifiability to that of the corresponding ODE model. This approach serves two purposes at the same time: First of all, we aim to get further biological insight into the kinetic processes. Second, we investigate and showcase the benefit of SDE models over ODE models with respect to parameter identifiability on the specific biological example. In doing so, we build on the work of others: While the same ODE model has already been considered e. g. by Fröhlich et al. (2018) and Ballnus et al. (2017), we contrast it with the corresponding SDE dynamics. Inference from fluorescence data for SDE models has also been conducted e. g. in Heron et al. (2007), Finkenstädt et al. (2008), and Komorowski et al. (2009), however for an experimental setup that also included the transcription process, other than in the experiments considered here. We add the focus on parameter identifiability and the exclusive use of diffuse priors as novel aspects.

This article is structured as follows: In Sect. 2, we describe the experimental data that motivated the study. In Sect. 3, we present the biochemical reaction network which we want to consider for the translation kinetics after mRNA transfection and formulate its ODE and SDE representation. After stating the assumed model of the observations and summarizing the parameters that we would like to infer from these observations in Sect. 4, we consider several approaches to assess structural identifiability of the parameters for both model types in Sect. 5. We define the parameter posteriors for both modeling approaches and study the practical identifiability of the parameters based on simulated and experimental data in Sect. 6, and conclude with a summary and discussion of our findings in Sect. 7. All relevant code to perform the presented analysis is available on Github: https://github.com/fuchslab/Translation_kinetics_after_transfection.

2 Experimental data

We consider data that has previously been analyzed (based on ODE modeling) and published in Fröhlich et al. (2018). The data was generated in an experiment where human hepatoma epithelial cells from the cell line HuH7 were transfected with mRNA encoding a green fluorescent protein (GFP). The cells were fixed on micro patterned protein arrays and time lapse microscopy images of the cells were taken every 10 minutes over the course of at least 30 hours (i.e. there are at least 180 measurements per cell). For the first hour, the cells were incubated with mRNA lipoplexes which consist of (cationic) lipid membrane encapsulating GFP coding mRNA. Afterwards, the cells were washed with cell culture medium such that no further lipoplex uptake occurred. The time point at which the lipoplexes were taken up, dissolved and released the mRNA as well as the number of mRNA molecules released are unknown. For the considered experimental setup, individual lipoplexes carry on average 350 mRNAs and the number of lipoplexes entering a single cell seems to be very low (Leonhardt et al. 2014).

The released mRNA was translated into a fluorescent protein which caused the cells to fluoresce. For each image taken during the experiment, the fluorescence intensity is integrated over squares occupied by one cell in order to obtain one value for the mean fluorescence intensity per cell and time point (see Fröhlich et al. 2018, for further details about the image analysis and Fig. 1 therein for an example image).

The experiment was conducted with two different types of GFP that differ in their protein lifetime: enhanced GFP (eGFP) and a destabilized enhanced GFP (d2eGFP). The data set contains measurements for more than 800 cells for each type of GFP. Some trajectories of the mean fluorescence intensity are displayed in Fig. 1.

Fig. 1
figure 1

Trajectories of the mean fluorescence intensity for seven cells from the mRNA transfection experiment in Fröhlich et al. (2018) for eGFP and d2eGFP (April 27, 2016), respectively

It was shown before that ODE models of the translation kinetics of an individual cell are not globally identifiable with the available experimental data as described above (Ballnus et al. 2017). Several of the ODE model parameters cannot be uniquely determined based on one observed fluorescence trajectory. Fröhlich et al. (2018) use a mixed-effect ODE model in order to incorporate the translation kinetics of several cells and data for both different types of GFP (eGFP and d2eGFP). Through this approach, they are able to improve parameter identifiability (by breaking the symmetry between the degradation rate constants); however, their approach is computationally very intense, necessitates distribution assumptions about the random effects, required conducting the experiment with two types of GFP, and still leaves several parameters non-identifiable. Here, we are interested in the question whether the use of an SDE model can improve the parameter identifiability even when only one fluorescence trajectory is observed.

3 Mathematical models of the translation kinetics

Following the introduction of mRNA into a living cell, i. e. after mRNA transfection, the information contained in the mRNA molecules is used for the production of (green fluorescent) proteins, which is the translation process. We will focus on modeling the translation kinetics of one cell in order to study parameter identifiability based on one observed fluorescence trajectory. Therefore, we only consider the (released) mRNA and the GFP molecules explicitly:

As the number of lipoplexes entering a cell is small (Leonhardt et al. 2014) and as most observed fluorescence trajectory do not show a bi-phasic behaviour, we assume that all mRNA molecules (within one cell) are released at once from the lipoplexes and denote this initial time point by \(t_0\). Before \(t_0\), there are neither mRNA nor GFP molecules, and at \(t_0\), an amount of \(m_0\) mRNA molecules is released, i.e.

$$\begin{aligned} \varvec{X}(t) \equiv \begin{pmatrix}0\\ 0\end{pmatrix} \text { for } t < t_0 \quad \text { and } \varvec{X}(t_0) = \begin{pmatrix}m_0\\ 0\end{pmatrix}. \end{aligned}$$

Conceivable extensions of this basic model are e. g. to include enzymatic degradation of the mRNA and/or the protein, ribosomal binding to the mRNA for translation, and a maturation step of the protein. However, we will only consider the basic configuration as described above.

3.1 Markov jump process

Assuming that the matter within the cell is well-stirred and in thermal equilibrium, an MJP is regarded to be the most adequate representation of this system after \(t_0\). In our model, there are three possible reactions:

$$\begin{aligned} X_1&\overset{\theta _1}{\longrightarrow } \emptyset&\text {degradation of mRNA,}\\ X_1&\overset{\theta _2}{\longrightarrow } X_1 + X_2&\text {translation,} \\ X_2&\overset{\theta _3}{\longrightarrow } \emptyset&\text {degradation of GFP} \end{aligned}$$

with kinetic rate parameters \(\theta _1\), \(\theta _2\), and \(\theta _3\). Given the state vector \(\varvec{X}(t)=\left( X_1(t),X_2(t)\right) ^T\), the propensity functions of the three reactions are

$$\begin{aligned} f_1(\varvec{X}(t)) = \theta _1 X_1, \quad f_2(\varvec{X}(t)) = \theta _2 X_1, \quad \text {and} \quad f_3(\varvec{X}(t)) = \theta _3 X_2, \end{aligned}$$

i. e. the probability that reaction i occurs within the next time interval of length \(\Delta t\) is approximately \(f_i(\varvec{X}(t))\Delta t\) for \(i=1,2,3\).

If we denote the probability distribution of the random variable \(\varvec{X}(t)\) by

$$P_{i,j}(t) = \mathbb {P}(X_1(t) = i, X_2(t)=j)$$

for a fixed initial condition on \(\varvec{X}(t_0)\), the corresponding CME reads

$$\begin{aligned} \frac{\partial P_{i,j}(t)}{\partial t} =&\, \theta _1(i+1) P_{i+1,j}(t) + \theta _2 i P_{i,j-1}(t) +\theta _3(j+1) P_{i,j+1}(t) \\&- (\theta _1 i + \theta _2 i + \theta _3 j) P_{i,j}(t). \end{aligned}$$

Although the system contains only first-order reactions, there is no known closed-form solution to the CME. Thus, the generation of model trajectories and the evaluation of the likelihood of data given parameter values requires computationally demanding stochastic simulations. This fact impedes efficient inference for this MJP and leads to the consideration of other modeling approaches.

3.2 ODE model

The RRE is a system of ODEs that provides a deterministic and state-continuous description of biochemical kinetics and can be derived as the large-volume limit of the corresponding MJP (see e. g. Kurtz 1972). The following system of ODEs is a deterministic approximation of the MJP modeling the dynamics as described above:

$$\begin{aligned} \frac{\mathrm {d}\varvec{X}(t)}{\mathrm {d}t}&= \begin{pmatrix} -\theta _1 X_1(t) \\ \theta _2 X_1(t) -\theta _3 X_2(t) \end{pmatrix} \quad \text { for } t \ge t_0.\ \end{aligned}$$
(1)

This system has previously been formulated and analyzed e. g. in Leonhardt et al. (2014). It admits the solution

$$\begin{aligned} X_1(t)&= m_0\exp \left( -\theta _1(t-t_0)\right) ,\nonumber \\ X_2(t)&= {\left\{ \begin{array}{ll} \frac{\theta _2 m_0}{\theta _3 - \theta _1}\left( e^{-\theta _1 (t-t_0)}-e^{-\theta _3 (t-t_0)} \right) &{}, \text {for } \theta _1\ne \theta _3,\\ \theta _2 m_0 (t-t_0)e^{-\theta _3 (t-t_0)} &{}, \text {for } \theta _1=\theta _3. \end{array}\right. } \end{aligned}$$
(2)

Note that the solution for \(X_2(t)\) is symmetric in the parameters \(\theta _1\) and \(\theta _3\). Moreover, note that while it is common for the RRE to describe molecule concentrations; here, we let \(\varvec{X}(t)\) denote molecule numbers. However, since the model that we consider here contains only first-order reactions, this does not affect the interpretation of the kinetic rate parameters (i. e. Eq. (1) would look the same for concentrations).

3.3 SDE model

While the ODE model of the translation kinetics after mRNA transfection had been considered before, we are the first to formulate the stochastic but state-continuous approximation to the MJP in Sect. 3.1 based on an (Itô) diffusion process. The approximating diffusion process is described by an SDE which is known as the chemical Langevin equation (CLE) in the context of biochemical kinetics (Gillespie 2000). The CLE is derived based on the changes in molecule numbers caused by the three possible reactions and on the corresponding propensities and reads:

$$\begin{aligned} \mathrm {d}\varvec{X}(t)&= \begin{pmatrix} -\theta _1 X_1(t) \\ \theta _2 X_1(t) -\theta _3 X_2(t) \end{pmatrix}\,\mathrm {d}t + \begin{pmatrix} \sqrt{\theta _1 X_1(t)} &{} 0 \\ 0 &{} \sqrt{\theta _2 X_1(t) + \theta _3 X_2(t)} \end{pmatrix}\,\mathrm {d}\varvec{B}(t) \end{aligned}$$
(3)

for \(t \ge t_0\) and where \(\varvec{B}(t)\) is a 2-dimensional standard Brownian motion. The solution of Eq. (3) is not explicitly known and will be approximated based on the Euler-Maruyama scheme as explained in Sect. A.1 of the supplementary material. See Fuchs (2013) for a concise introduction to diffusion processes and several approaches to derive general diffusion approximations.

4 Model parameters and observations

4.1 Model of the observations

In the experiment described in Sect. 2, neither the amount of mRNA molecules nor that of GFP molecules can be measured directly. Instead, a fluorescence signal is observed which is composed of a background fluorescence and a signal which is proportional to the amount of GFP molecules. Moreover, previous studies assumed a constant offset and multiplicative measurement noise in the recorded fluorescence trajectories (Fröhlich et al. 2018). Therefore, denoting a trajectory of mean fluorescence intensity observed at time points \(t_k\), for \(k=1,\dots ,K\), by \(\lbrace y_{k}\rbrace _{k=1,\dots ,K}\), we assume that

$$\begin{aligned} y_{k} = \left( \mathrm {scale} \cdot {} X_2(t_k) + \mathrm {offset}\right) \cdot \epsilon _{k}, \quad \epsilon _{k} \sim \log \mathcal {N}(0,\sigma ^2), \end{aligned}$$

where the random variables \(\epsilon _k\) are independent.

Note that the observations depend only on the amount \(X_2\) of GFP molecules, but not directly on the amount \(X_1\) of mRNA molecules.

4.2 Model parameters

Based on the observations \(\lbrace y_{k}\rbrace _{k=1,\dots ,K}\), we aim to infer the following unknown parameters:

  • The three kinetic parameters \(\varvec{\theta }= (\theta _1, \theta _2, \theta _3)\) that denote the rate constants for mRNA degradation, translation, and GFP degradation,

  • The initial amount \(m_0\) of mRNA molecules and the time point \(t_0\) at which it is released,

  • The scaling factor scale and the offset for the fluorescence signal,

  • And the standard deviation \(\sigma \) of the measurement errors.

In the remainder of this work, we investigate how far such inference is possible in theory and practice.

5 Structural identifiability analysis

Our main interest lies in the question which of the model parameters for our two model types (ODE and SDE) can be inferred from the experimental data as described in Sects. 2 and 4. Here, we first focus on the parameters \(\varvec{\theta }\), \(m_0\), \(\mathrm {scale}\), and \(\mathrm {offset}\) that drive the dynamics of the process and the fluorescence signal. We analyze the structural identifiability which only considers the model equations of the process dynamics and the observation equation (not the actual data) and assumes that we are in a perfect data situation, i. e. we have an infinite amount of data observed without measurement error (Raue et al. 2009). Plainly speaking, structural identifiability analysis answers the question whether different parameter combinations can lead to the same model output. While for ODE models, there are analytical methods to assess structural identifiability, no such methods exist for SDE models. Therefore, we consider several approaches that do not directly assess the structural identifiability for our SDE model, but answer related questions. In the following subsections, we consider a transformed version of both model types, we make use of a surrogate model and the open source software DAISY as has recently been suggested by Browning et al. (2020), and finally we also study simulations of both model types.

5.1 Transformed models

We can reformulate the differential equations for both model types by setting

$$\begin{aligned} \varvec{Z}(t) = \begin{pmatrix}Z_1(t)\\ Z_2(t)\end{pmatrix}:=\begin{pmatrix}\frac{X_1(t)}{m_0}\\ \mathrm {scale}\cdot X_2(t) + \mathrm {offset}\end{pmatrix}, \end{aligned}$$

which means that

$$\begin{aligned} \varvec{Z}(t) \equiv \begin{pmatrix}0\\ \mathrm {offset}\end{pmatrix} \text { for } t < t_0, \quad \text { and } \varvec{Z}(t_0) = \begin{pmatrix}1\\ \mathrm {offset}\end{pmatrix}. \end{aligned}$$

Hence, the second component of the transformed process models the fluorescence signal which we assume to be observed, meaning \(y = Z_2\).

5.1.1 Transformed ODE model

For the ODE model in Eq. (1), the transformed model reads

$$\begin{aligned} \frac{\mathrm {d} \varvec{Z}(t)}{\mathrm {d} t}&= \begin{pmatrix} -\theta _1 Z_1(t) \\ \mathrm {scale}\, \theta _2 m_0 Z_1(t) - \theta _3( Z_2(t)- \mathrm {offset}) \end{pmatrix} \text { for } t\ge t_0, \end{aligned}$$
(4)

and has the solution

$$\begin{aligned} Z_1(t)&= \exp \left( -\theta _1(t-t_0)\right) ,\\ Z_2(t)&= {\left\{ \begin{array}{ll} \frac{\mathrm {scale}\,\theta _2 m_0}{\theta _3 - \theta _1}\left( e^{-\theta _1 (t-t_0)}-e^{-\theta _3 (t-t_0)} \right) + \mathrm {offset}&{}, \text {for } \theta _1\ne \theta _3,\\ \mathrm {scale}\,\theta _2 m_0 (t-t_0)e^{-\theta _3 (t-t_0)} + \mathrm {offset} &{}, \text {for } \theta _1=\theta _3. \end{array}\right. } \end{aligned}$$

The parameters \(\mathrm {scale}\)\(m_0\), and \(\theta _2\) appear only as a product. Thus, it can already be deduced from this equation that at most the product of the three parameters will be identifiable but not the three parameters individually (Fröhlich et al. 2018). Moreover, since only \(Z_2(t)\) is observed and it is symmetric in the parameters \(\theta _1\) and \(\theta _3\) (i. e. switching their values will lead to the same model output), these two parameters can at most be locally identifiable if no further constraints are imposed on them as done in Leonhardt et al. (2014).

5.1.2 Transformed SDE model

For the SDE model in Eq. (3), we apply the Itô formula (as stated in Sect. A.1 of the supplementary material) to obtain the transformed model

$$\begin{aligned} \mathrm {d} \varvec{Z}(t)&= \begin{pmatrix} -\theta _1 Z_1(t) \\ \mathrm {scale}\, \theta _2 m_0 Z_1(t) - \theta _3( Z_2(t)- \mathrm {offset}) \end{pmatrix} \,\mathrm {d} t \nonumber \\&\quad + \begin{pmatrix} \sqrt{ \frac{\theta _1}{{m_0}}Z_1(t)} &{} 0 \\ 0 &{} \sqrt{{\mathrm {scale}}}\sqrt{\mathrm {scale}\, \theta _2 m_0 Z_1(t) + \theta _3(Z_2(t)- \mathrm {offset})} \end{pmatrix} \,\mathrm {d}B_t \text { for } t\ge t_0. \end{aligned}$$
(5)

Note that here, the parameters \(\mathrm {scale}\) and \(m_0\) also appear outside the product \(\mathrm {scale}\, \theta _2 m_0\). Furthermore, there is no apparent symmetry between \(\theta _1\) and \(\theta _3\). Therefore, it might be possible to gain more information about the individual parameters from data for the SDE model than for the ODE model.

5.2 Using a surrogate model

Next, we want to use the open source software DAISY (Differential Algebra for Identifiability of SYstems, Bellu et al. 2007) to assess the structural identifiability of the parameters in the two models of the translation kinetics. DAISY is a software tool that implements a differential algebra algorithm to perform structural identifiability analysis for systems of polynomial or rational ODEs and that also allows to include unknown initial conditions. We use the transformed models from the previous subsection for the identifiability analysis. For the ODE model in Eq. (4), the analysis with DAISY is straight forward since it is intended for the use for ODE models. After applying DAISY, the obtained output shows that when considering the set of parameters \(\lbrace \varvec{\theta }, m_0, \mathrm {scale}, \mathrm {offset} \rbrace \), the model is non-identifiable. The DAISY output also reveals that this non-identifiability is due to the fact that the parameters \(\theta _1\) and \(\theta _3\) are only locally identifiable and the parameters \(\theta _2\), \(m_0\), and \(\mathrm {scale}\) are not individually identifiable, but only their product is. This confirms the assertions from the previous subsection. Moreover, we obtain that the remaining parameter \(\mathrm {offset}\) is structurally identifiable.

For SDE models, Browning et al. (2020) suggest to formulate a surrogate model based on the moment equations of the diffusion process. The moment equations are a system of ODEs, and thus, DAISY can be applied to this system. For the SDE (5), let \(M_{ij}(t)=\mathbb {E}\left[ (Z_1(t))^i(Z_2(t))^j \right] \) be the (mixed) moment of the diffusion process of order i and j. The moments are obtained by applying the Itô formula (as stated in Sect. A.1 of the supplementary material) to \((Z_1(t))^i(Z_2(t))^j\) and then taking the expectation. Considering the first and the second moments of the process states results in the following system of ODEs:

$$\begin{aligned} \frac{\mathrm {d}M_{10}(t)}{\mathrm {d}t}&= -\theta _1 M_{10}(t),&M_{10}(t_0)=1,\\ \frac{\mathrm {d}M_{01}(t)}{\mathrm {d}t}&= \mathrm {scale}\, \theta _2 m_0\, M_{10}(t) - \theta _3\, M_{01}(t) + \theta _3 \mathrm {offset},&M_{01}(t_0)=\mathrm {offset},\\ \frac{\mathrm {d}M_{20}(t)}{\mathrm {d}t}&= \frac{\theta _1}{m_0}\, M_{10}(t) - 2\theta _1\, M_{20}(t),&M_{20}(t_0)=1,\\ \frac{\mathrm {d}M_{02}(t)}{\mathrm {d}t}&= \mathrm {scale}^2\theta _2 m_0\, M_{10}(t) + \theta _3(\mathrm {scale} \\&\quad + 2\mathrm {offset})\, M_{01}(t) - 2\theta _3\, M_{02}(t)\\&\quad + 2\mathrm {scale}\, \theta _2 m_0\, M_{11}(t) - \mathrm {scale}\, \theta _3\mathrm {offset},&M_{02}(t_0)=\mathrm {offset}^2,\\ \frac{\mathrm {d}M_{11}(t)}{\mathrm {d}t}&= \theta _3\mathrm {offset} \, M_{10}(t) + \mathrm {scale}\,\theta _2 m_0\\&\quad M_{20}(t) -(\theta _1+\theta _3)\,M_{11}(t),&M_{11}(t_0)=\mathrm {offset}, \end{aligned}$$

where the equations for the two first moments \(M_{10}\) and \(M_{01}\) coincide with the ODE model in Eq. (4). Since in the experiment, only the fluorescence signal is observed, we consider the moments that only depend on the second component of the process, i. e. \(M_{01}\) and \(M_{02}\), as output states for the identifiability analysis. Using DAISY, we obtain that the surrogate model is globally identifiable, i. e. all six parameter values could be uniquely determined if we were able to observe the moments \(M_{01}\) and \(M_{02}\) directly, infinitely long over time, and without measurement error. While this implies that the parameters are structurally identifiable from moment measurements, it does not directly answer our original question of whether the parameters of the SDE model can be identified from perfect observations of a single fluorescence trajectory.

Moreover, as has already been pointed out, structural identifiability analysis assumes a perfect data situation. From its results, we cannot conclude that the parameters will be identifiable from the actual experimental data (which is always finite and usually subject to measurement error). In contrast, the notion of practical identifiability is concerned with the question whether the parameters can be determined from a specific data set (Raue et al. 2009) and we will address this question in Sect. 6.

5.3 Simulation-based assessment of parameter influence

Another attempt to assess parameter identifiability is to simulate from both model types for different parameter settings and compare whether we see differences in the simulation output. To obtain simulations from the ODE model, we use its solution in Eq. (2). Since the ODE model is deterministic, each parameter setting yields one unique output trajectory; whereas for the SDE model, we simulate several trajectories for each parameter setting using the Euler-Maruyama scheme with a time step of 0.01 hours.

5.3.1 Keeping the product \(\mathrm {scale}\,\theta _2m_0\) constant

As already pointed out in Sect. 5.1, the trajectories of the fluorescence intensity for the ODE model are identical if the product \(\mathrm {scale}\,\theta _2m_0\) and the remaining parameters are fixed, even when the individual factors \(\mathrm {scale}\), \(\theta _2\), and \(m_0\) vary. To assess whether this is (at least approximately) also the case for the SDE model, we simulate several trajectories with different values for \(\mathrm {scale}\), \(\theta _2\), and \(m_0\) while keeping their product constant. For each parameter setting, we set the same random seed at the beginning of the simulation. To ensure that the analysis is performed in a relevant regime, we chose the parameters similar to the parameters estimated by Fröhlich et al. (2018): \(\mathrm {scale}\,\theta _2m_0 = 350\), \(\theta _1 = 0.11\), \(\theta _3 = 0.03\), \(\mathrm {offset} = 8.9\), and \(t_0 = 0\). Fig. 2 displays the simulated trajectories.

Fig. 2
figure 2

The ODE trajectory and 20 SDE trajectories of the fluorescence intensity simulated for different values of \(m_0\), \(\theta _2\), and \(\mathrm {scale}\), while keeping their product constant at \(\mathrm {scale}\,\theta _2m_0 = 350\). The remaining parameters are set to \(\theta _1 = 0.11\), \(\theta _3 = 0.03\), \(\mathrm {offset} = 8.9\), and \(t_0 = 0\)

It is evident that the SDE trajectories behave differently for different combinations of \(\mathrm {scale}\), \(\theta _2\), and \(m_0\) which yield the same values of the product \(\mathrm {scale} \theta _2 m_0\). For example, when we keep \(m_0\) fixed while increasing \(\mathrm {scale}\) and decreasing \(\theta _2\), the variation between but also within the trajectories increases. When we keep \(\mathrm {scale}\) fixed while decreasing \(m_0\) and increasing \(\theta _2\), especially the variation between trajectories seems to increase. And finally, when we keep \(\theta _2\) fixed while decreasing \(m_0\) and increasing \(\mathrm {scale}\), the variation between and within the trajectories increases. Our focus is on estimating the parameters from individual observed trajectories. In this context, especially the difference in the variation within the trajectories is relevant.

5.3.2 Swapping the degradation rate constants \(\theta _1\) and \(\theta _3\)

The trajectories of the fluorescence intensity for the ODE model are also identical if the values for \(\theta _1\) and \(\theta _3\) are swapped while the remaining parameters are fixed. We simulate trajectories for the parameter combinations \((\theta _1,\theta _3)=(0.11,0.03)\) and \((\theta _1^*,\theta _3^*)=(0.03, 0.11)\), respectively, while setting the remaining parameters to \(\mathrm {scale}=17.5\), \(\theta _2=0.1\), \(m_0=200\), \(\mathrm {offset}=8.9\), and \(t_0=0\). For the SDE model, we again simulate several trajectories for both parameter settings and set the same random seed at the beginning of the simulation.

Figure 3 shows the ODE trajectory and several SDE trajectories in one panel for each of the two parameter combinations separately. Whereas, Fig. 4 presents one SDE trajectory for each of the two parameter combinations together in one panel. Again, the SDE trajectories do behave differently for the different parameter combinations. While there seems to be only little difference in the variation between the trajectories, the variation within the trajectories is clearly higher for lower \(\theta _1\) and higher \(\theta _3\). This indicates that it may be possible to uniquely determine the values of \(\theta _1\) and \(\theta _3\) even when estimating from only one observed trajectory.

Fig. 3
figure 3

The ODE trajectory and 20 SDE trajectories of the fluorescence intensity simulated for two parameter combinations where the values of \(\theta _1\) and \(\theta _3\) are swapped. The remaining parameters are set to \(\mathrm {scale}=17.5\), \(\theta _2=0.1\), \(m_0=200\), \(\mathrm {offset}=8.9\), and \(t_0=0\)

Fig. 4
figure 4

One trajectory of the fluorescence intensity for the SDE model simulated for each of the two parameter combinations where the values of \(\theta _1\) and \(\theta _3\) are swapped. The remaining parameters are set to \(\mathrm {scale}=17.5\), \(\theta _2=0.1\), \(m_0=200\), \(\mathrm {offset}=8.9\), and \(t_0=0\)

6 Posterior properties and credible intervals

After having studied the structural properties of the ODE and SDE model in the previous section; next, we would like to assess the practical parameter identifiability by trying to estimate the parameters from data as described in Sect. 4. We take a Bayesian approach to parameter estimation because it allows for uncertainty assessment of the parameter estimates and also for handling unobserved process components and measurement error by using Markov chain Monte Carlo (MCMC) methods to sample from the parameter posterior distribution. Therefore, in this section, we define the parameter posterior densities for the two model types and study their properties based on MCMC sampling results for simulated data generated without and with (simulated) measurement error as well as for experimental data.

6.1 Formulation of the inference problem

In Bayesian statistics, we can formulate our assumptions and general knowledge about the model parameters \(\varvec{\xi }\in \Xi \subseteq \mathbb {R}^p\) in terms of a prior distribution with probability density \(p(\varvec{\xi })\). After having observed data \(\mathcal {D}\) about the phenomenon which we are trying to model, we update our knowledge about the parameter and describe it by the posterior distribution with density \(\pi \left( \varvec{\xi }\,\vert \,\mathcal {D}\right) \). In our case, we consider data \(\mathcal {D}= \lbrace y_{k}\rbrace _{k=1,\dots ,K}\) and the vector of all unknown parameters is \(\varvec{\xi }=\left( \varvec{\theta },m_0, \mathrm {scale},\mathrm {offset},t_0,\sigma \right) \). The relation between the prior and the posterior density resulting from Bayes’ theorem is given by:

$$\begin{aligned} \pi \left( \varvec{\xi }\,\vert \,\mathcal {D}\right) \propto p\left( \mathcal {D}\,\vert \,\varvec{\xi }\right) p\left( \varvec{\xi }\right) , \end{aligned}$$

where \(p\left( \mathcal {D}\,\vert \,\varvec{\xi }\right) \) denotes the density of the distribution of \(\mathcal {D}\) conditioned on \(\varvec{\xi }\) and is determined by the considered model. Viewed as a function of the parameter, \(\mathcal {L}\left( \varvec{\xi }\,\vert \,\mathcal {D}\right) := p\left( \mathcal {D}\,\vert \,\varvec{\xi }\right) \) is called the likelihood (function). Comprehensive introductions to Bayesian statistics can be found e. g. in Lee (2012) and Gelman et al. (2013).

To sample from the subsequently formulated posterior densities for the two model types, we use the open source software Stan (Carpenter et al. 2017) which provides an efficient C++ implementation of the Hamiltonian Monte Carlo (HMC) based No-U-Turn Sampler (NUTS). See Sect. A.3 of the supplementary material for a brief introduction to this topic. We use Stan through its R interface rstan (Stan Development Team 2019).

6.1.1 Posterior of the ODE model

For the ODE model, there is a deterministic relationship between the process values \(\varvec{X}(t)\) and the parameters \(\varvec{\theta }\), \(m_0\) and \(t_0\) (or between the fluorescence signal and the parameters additionally including \(\mathrm {scale}\) and \(\mathrm {offset}\), respectively).

Define the index \(k^* := \min \lbrace {k\in \lbrace 1,\dots ,K\rbrace } \vert t_k \ge t_0\rbrace \) of the first observation time point after the mRNA molecules are released, then the posterior density \(\pi \) from which we would like to sample is proportional to

$$\begin{aligned}&\pi \left( \theta ,m_0,\mathrm {scale}, \mathrm {offset}, \sigma ^2, t_0\vert \lbrace y_k\rbrace _{k=1,\dots ,K}\right) \nonumber \\&\quad \propto \left( \prod _{ k=k^* }^{K} \phi \left( \log (y_k)\left| \log \left( \mathrm {scale} \frac{\theta _2 m_0}{\theta _3 {-} \theta _1}\left( e^{-\theta _1 (t-t_0)}-e^{-\theta _3 (t-t_0)} \right) {+} \mathrm {offset}\right) , \sigma ^2\right. \right) \!\right) \nonumber \\&\qquad \cdot \left( \prod _{ k=1 }^{k^*-1} \phi \left( \log (y_k)\left| \log \left( \mathrm {offset}\right) , \sigma ^2\right. \right) \right) \nonumber \\&\qquad \cdot p(\theta _1)p(\theta _2)p(\theta _3)p(m_0)p(t_0)p(\mathrm {scale})p(\mathrm {offset}) p(\sigma ^2), \end{aligned}$$
(6)

where \(\phi (\cdot \vert \mu , \eta ^2)\) denotes the density of the normal distribution with mean \(\mu \) and variance \(\eta ^2\) and the \(p(\cdot )\) denote the parameter prior densities which we assume to be independent.

If the priors \(p(\theta _1)\) and \(p(\theta _3)\) are symmetric to each other, then the posterior is also symmetric with respect to the two degradation rate constants.

The scaling factor \(\mathrm {scale}\), the translation rate constant \(\theta _2\), and the initial amount of mRNA \(m_0\) appear only as a product in the likelihood function; therefore, as pointed out before, at most their product \(\mathrm {scale}\theta _2m_0\) is identifiable.

6.1.2 Posterior of the SDE model

For the SDE model, the states \(\varvec{X}(t_k)\), for \(k=1,\dots ,K\), of the process conditioned on the parameters \(\varvec{\theta }\), \(m_0\) and \(t_0\) are random numbers (for \(t_k\ge t_0\)). Hence, we have to marginalize over the process states to obtain the posterior density of the parameters which we want to infer:

$$\begin{aligned} \pi&\left( \theta ,m_0,\mathrm {scale}, \mathrm {offset}, \sigma ^2, t_0\vert \lbrace y_k\rbrace _{k=1,\dots ,K}\right) \\&= \int _{\mathbb {R}_+^{2\times K}}\pi \left( \theta ,m_0,\mathrm {scale}, \mathrm {offset}, \sigma ^2, t_0 , \lbrace \varvec{X}(t_k)\rbrace _{k=1,\dots ,K}\vert \lbrace y_k\rbrace _{k=1,\dots ,K}\right) \\&\quad \qquad \mathrm {d} \varvec{X}(t_1)\dots \mathrm {d}\varvec{X}(t_K). \end{aligned}$$

Therefore, again defining \(k^* := \min \lbrace {k\in \lbrace 1,\dots ,K\rbrace } \vert t_k \ge t_0\rbrace \), we would need to sample from

$$\begin{aligned} \pi&\left( \theta ,m_0,\mathrm {scale}, \mathrm {offset}, \sigma ^2, t_0 , \lbrace \varvec{X}(t_k)\rbrace _{k=1,\dots ,K}\vert \lbrace y_k\rbrace _{k=1,\dots ,K}\right) \\&\propto \left( \prod _{ k=1 }^{K} \phi \left( \log (y_k)\vert \log \left( \mathrm {scale} \cdot X_2(t_k) + \mathrm {offset}\right) , \sigma ^2\right) \right) \\&\cdot \left( \prod _{ k{=}k^* }^{K-1} \pi \left( \varvec{X}(t_{k+1})\vert \varvec{X}(t_k),\theta \right) \right) \pi \left( \varvec{X}(t_{k^*})\vert \theta , m_0, t_0\right) \left( \prod _{ k=1 }^{k^*-1} \delta (\Vert \varvec{X}(t_k) {-} (0,0)^T\Vert )\right) \\&\quad \cdot p(\theta )p(m_0)p(t_0)p(\mathrm {scale})p(\mathrm {offset}) p(\sigma ^2), \end{aligned}$$

where \(\phi (\cdot \vert \mu , \eta ^2)\) denotes the density of the normal distribution with mean \(\mu \) and variance \(\eta ^2\), \(\delta (\cdot )\) denotes the Dirac delta function, \(\Vert \cdot \Vert \) denotes a norm (e. g. the \(l_2\)-norm), and the factors \(\pi \left( \varvec{X}(t_{k+1})\vert \varvec{X}(t_k),\theta \right) \), \(k=k^*,\ldots ,K-1\), denote the transition density of the process. However, the fact that the process \(\varvec{X}\) switches from a deterministic regime before \(t_0\) to a stochastic one after \(t_0\) complicates the estimation of \(t_0\) together with the remaining parameters. Therefore, we will assume that \(t_0\) is determined beforehand, e. g. based on the estimates for the ODE model. Consequently, we sample from

$$\begin{aligned} \pi&\left( \theta ,m_0,\mathrm {scale}, \mathrm {offset}, \sigma ^2, \lbrace \varvec{X}(t_k)\rbrace _{k=1,\dots ,K}\vert \lbrace y_k\rbrace _{k=1,\dots ,K}, t_0 \right) \nonumber \\&\propto \left( \prod _{ k=1 }^{K} \phi \left( \log (y_k)\vert \log \left( \mathrm {scale} \cdot X_2(t_k) + \mathrm {offset}\right) , \sigma ^2\right) \right) \nonumber \\&\quad \cdot \left( \prod _{ k=k^* }^{K-1} \pi \left( \varvec{X}(t_{k+1})\vert \varvec{X}(t_k),\theta \right) \right) \pi \left( \varvec{X}(t_{k^*})\vert \theta , m_0, t_0\right) \left( \prod _{ k=1 }^{k^*-1} \delta (\Vert \varvec{X}(t_k) - (0,0)^T\Vert )\right) \nonumber \\&\quad \cdot p(\theta )p(m_0)p(\mathrm {scale})p(\mathrm {offset}) p(\sigma ^2). \end{aligned}$$
(7)
Fig. 5
figure 5

One trajectory used in the simulation study that was simulated with Gillespie’s algorithm with parameters \(\theta =(0.2,0.32,0.01)\), \(m_0=240\), \(t_0=0.96\), \(\mathrm {scale}=1.8\), and \(\mathrm {offset}=8.5\), and for the green dotted line, multiplicative measurement error with \(\sigma =0.02\) was added to the fluorescence intensity (FI)

While for the ODE model, the posterior distribution is only 8-dimensional and can be sampled from directly; for the SDE model, we need to sample from a \((7+2K)\)-dimensional distribution and then marginalize over the 2K dimensions of the process states to obtain the posterior distribution of the parameters of interest. Moreover, there is no explicit exact expression for the transition density \(\pi \left( \varvec{X}(t_{k+1})\vert \varvec{X}(t_k),\theta \right) \); wherefore, it will be approximated by a normal density based on the Euler-Maruyama scheme as explained in Sect. A.1 of the supplementary material. For this approximation to be appropriate, we have to ensure that the time steps between observations are small enough. We do so in Sect. A.2 of the supplementary material. For more complex models or lower-resolution data, approaches like pseudo-marginal methods would be required instead of the Euler-Maruyama scheme.

6.2 Estimation based on simulated data

In order to assess how well we can recover the model parameters for both model types from individually observed trajectories, we first work with simulated data that is generated with Gillespie’s algorithm (Gillespie 1976, 1977).

6.2.1 Sampling results for simulated data without measurement error

For now, we assume the fluorescence intensity to be observed without measurement error. We consider synthetic data generated with parameters \(\theta =(0.2,0.32,0.01)\), \(m_0=240\), \(t_0=0.96\), \(\mathrm {scale}=1.8\), and \(\mathrm {offset}=8.5\), which are based on the results by Fröhlich et al. (2018) and thus represent realistic values. The data was simulated with Gillespie’s algorithm over a time interval of 30 hours which yields trajectories similar to the experimental data, and we use the fluorescence intensity at 181 equidistant time points as observations to mirror the structure of the experimental data. The simulated fluorescence intensity (without measurement error) is depicted by the blue dotted line on the right hand side of Fig. 5. We use Stan to sample from the posterior distributions of the ODE model and the SDE model given the simulated data. Since we assume the data to be observed without measurement error, the parameter \(\mathrm {offset}\) can be determined directly from the first observation. Therefore, we do not include measurement error (and thus the parameter \(\sigma \)) and the parameter \(\mathrm {offset}\) in the posterior distribution of the SDE model. Whereas for the ODE model, deviations of the observed data from the deterministic ODE trajectory have to be attributed to measurement error; therefore, the parameter \(\sigma \) has to be included in the posterior distribution of the ODE model. We also include the parameter \(\mathrm {offset}\) for the ODE model in order to avoid degeneracy of the posterior. For the SDE model, we use the mean estimate of \(t_0\) obtained from the sample for the ODE model as the fixed value of \(t_0\). We assume the following prior distributions: \(\theta _i\mathbin {\sim }\mathcal {N}_{\ge 0}(0,5^2)\) for \(i=1,2,3\), \(m_0\mathbin {\sim }\mathcal {N}_{\ge 0}(300,300^2)\), \(\mathrm {scale}\mathbin {\sim }\mathcal {U}(0,30)\), where \(\mathcal {N}_{\ge a}(\mu ,\eta ^2)\) denotes the normal distribution truncated from below by a, and additionally for the ODE model, \(\mathrm {offset}\mathbin {\sim } \mathcal {U}(0,30)\), \(\sigma \mathbin {\sim } \mathcal {U}(0.001,10)\), and \(t_0\mathbin {\sim } \mathcal {U}(0,30)\). These diffuse priors reflect our little prior knowledge.

For both model types, we generate 8 HMC chains of 5000 iterations and discard the first half of the iterations as warm-up. Thus, we use a posterior sample of size 20,000 for each model type in the subsequent analysis. Tables 1 and 2 summarize the Stan output of the posterior samples for the ODE and the SDE model, respectively, and also include the true parameter values that were used to simulate the data for comparison. The tables also contain the \(2.5\%\)-, \(50\%\)-, and \(97.5\%\)-quantiles of the samples. We use the interval between the \(2.5\%\)- and the \(97.5\%\)-quantile as an estimate of the 95%-credible interval (CI). This choice of CI will lead to one wide interval in case of a bimodal density rather than the union of two narrow intervals as it would be the case for a highest posterior density region. This is in line with relating the width of a CI to parameter identifiability and becomes relevant for \(\theta _1\) and \(\theta _3\) in the ODE case, see Fig. 6. For the ODE model, we see that the parameters \(\mathrm {offset}\) and \(t_0\) are well estimated since mean and median of the sample correspond to the true value, the CIs are very narrow, the effective sample size (ESS) \(n_\text {eff}\) is high and \(\hat{R}\) is equal to 1. As expected, the measurement error parameter \(\sigma \) is estimated to be higher than the true value of zero. Of greater interest are the remaining parameters as we can compare the results for them between the two model types.

Table 1 Summary of the Stan output for the ODE model given simulated data without measurement error; the true parameter values that were used to simulate the data; and the prior distributions used for the Stan sampling, where \(\mathcal {N}_{\ge a}(\mu ,\eta ^2)\) denotes the normal distribution truncated from below by a and \(\mathcal {U}(l,u)\) denotes the uniform distribution
Table 2 Summary of the Stan output for the SDE model given simulated data without measurement error, the true parameter values that were used to simulate the data; and the prior distributions used for the Stan sampling

We first focus on the two degradation rate constants \(\theta _1\) and \(\theta _3\). Our analysis in Sect. 5 already showed that for the ODE model, these two parameters are only locally identifiable and the likelihood function landscape is symmetric with respect to them in the case of identical priors for both parameters. This symmetry is for the rather uninformative prior distributions also preserved in the posterior distribution and apparent in the density plots in Fig. 6. The density estimates of the posterior sample for the ODE model are clearly bimodal. The reason that the two modes are not exactly symmetric here is that HMC chains usually are only able to explore one mode and in our example 5 out of the 8 chains happen to end up in the mode where \(\theta _1\) is higher than \(\theta _3\) while only 3 chains converge to the other mode. For sampling methods which facilitate switching between modes, e.g. adaptive parallel tempering, this was not observed (results not shown). The fact that each HMC chain only samples from one of the modes is also the reason for the extremely low ESSs and the very high values of \(\hat{R}\) for \(\theta _1\) and \(\theta _3\) in Table 1. Moreover, note that neither of the modes and not even the ranges of all values in the posterior sample cover the true parameter values of \(\theta _1\) and \(\theta _3\).

For the SDE model on the other hand, Fig. 6 and Table 2 show that the posterior density is clearly unimodal with respect to \(\theta _1\) and \(\theta _3\), and mean and median of the sample are close or equal to the true values. High ESSs and \(\hat{R}\) values equal to 1 are achieved, from which we derive reliability of the 95% CI. These are narrow and cover the true parameter values. Thus, we can conclude that the parameters \(\theta _1\) and \(\theta _3\) are identifiable for the SDE model here.

Fig. 6
figure 6

Density estimates of the posterior samples for parameters \(\theta _1\) and \(\theta _3\) for the SDE (blue, lower triangle) and ODE (green, upper triangle) model given simulated data without measurement error. Diagonal panels: Marginal densities for the respective parameter and boxplots showing the 95% CI as box, the range of the sample as whiskers, and the median as thick black line. Off-diagonal panels: Smoothed scatter plots of the two-dimensional projections of the samples where darker hues signify higher density values. The dotted lines represent the true parameter values that were used to simulate the data

Fig. 7
figure 7

Density estimates of the posterior samples for parameters \(\theta _2\), \(m_0\), \(\mathrm {scale}\), and their products for the SDE (blue, lower triangle) and ODE (green, upper triangle) model given simulated data without measurement error. For a detailed description of the figure’s elements, see Fig. 6

Next, we consider the translation rate constant \(\theta _2\), the initial amount \(m_0\) of mRNA molecules, and the factor \(\mathrm {scale}\). For the ODE, at most the product \(\theta _2m_0\mathrm {scale}\) is identifiable. This is also apparent from the results presented in Table 1 and Fig. 7. For the individual parameters and also for all products of two out of the three parameters, the 95% CIs are extremely broad and the mean and median as point estimates are not at all close to the true values. The reason why there are nevertheless quite high ESSs and \(\hat{R}\) values equal to 1 achieved is that the variation within each of the HMC chains is very high and thus does not differ substantially from the variation between the chains for these parameters as can be seen in the traceplots of the chains in Sect. A.4.1 of the supplementary material. For the product \(\theta _2m_0\mathrm {scale}\) of all three parameters, the 95% CI is very narrow for the posterior sample of the ODE model and also the ESS is high and \(\hat{R}\) equal to 1. Without knowing the true parameter values, one would assume that this product is well estimated. However, the 95% CI and even the whole range of the sample do not cover the true value. For the SDE model, the 95% CI for the product \(\theta _2m_0\mathrm {scale}\) is broader, but it covers the true parameter value and also the mean and the median as point estimates are closer to the true value than the mean and the median for the ODE model. Moreover, the ESS is quite high and \(\hat{R}\) is equal to 1 for the SDE model. We therefore conclude that the product \(\theta _2m_0\mathrm {scale}\) is identifiable. The generally lower ESSs are due to the fact that for the SDE model, we sample from a distribution of much larger dimension as explained in Sect. 6.1. Additionally, for the SDE model, the parameters \(\mathrm {scale}\) and \(\theta _2m_0\) have narrow 95% CIs (especially compared to those for the ODE model) that include the true parameter values, high ESSs, and \(\hat{R}\) values of 1 and can therefore be considered identifiable. The remaining parameters \(\theta _2\), \(m_0\), \(\theta _2\mathrm {scale}\), and \(m_0\mathrm {scale}\) have rather broad 95% CIs and only achieve low ESSs and \(\hat{R}\) values higher than 1.02. Hence, they seem to be non-identifiable. Notice, however, that at least for the parameters \(\theta _2\) and \(\theta _2\mathrm {scale}\), the 95% CIs are substantially more narrow for the SDE model compared to the ODE model.

We have simulated another 99 trajectories with the same parameters and performed Stan sampling in the same way as described in the beginning of this subsection. For each model type and each posterior sample of the different simulated trajectories, we calculate the length of the 95% CI and determine the median and the coefficient of variation (c.v.) over these lengths for each model type. Also, we rescale the lengths of the 95% CI by dividing by the true parameter value and again determine the median of the normalized quantities. The rescaling is done to transfer the values to a more similar scale. Note, however, that the values are nevertheless not directly comparable between different parameters. Moreover, we check whether the true parameter value that was used to simulate the data is included in the 95% CI. Table 3 shows the aggregated results for the posterior samples of all 100 trajectories and also includes the length of the interval between the \(2.5\%\)- and the \(97.5\%\)-quantile of the prior distributions. Except for the parameters \(m_0\) and \(\theta _2m_0\mathrm {scale}\), the median length of the 95% CIs for the SDE model is always smaller than for the ODE model. For parameter \(\theta _2m_0\mathrm {scale}\), the CI lengths are a lot smaller for the ODE model; however, the CIs cover the true parameter value only 13 out of 100 times while for the SDE model, the true value is covered 93 times. For the other parameters that we classified as identifiable for the SDE model in the analysis of the individual trajectory (i. e. \(\theta _1\), \(\theta _3\), \(\mathrm {scale}\), and \(\theta _2m_0\)), the median length of the 95% CIs is clearly smaller for the SDE model than for the ODE model and the true parameter value is covered at least 91 out of 100 times for the SDE model. For parameter \(m_0\), the CI lengths are high for both model types because the parameter is not identifiable for either model type. For the other parameters that we classified as not identifiable for both model types in the analysis of the individual trajectory (i. e. \(\theta _2\), \(\theta _2\mathrm {scale}\), and \(m_0\mathrm {scale}\)), the median length of 95% CIs is clearly smaller for the SDE model than for the ODE model, at least by a factor of 4.

Table 3 Statistics of posterior samples for the two model types aggregated over 100 simulated trajectories without measurement error

The last two columns of Table 3 are visualized in Fig. 8 where we plot the median of the rescaled CI lengths against the number of CIs that cover the true parameter value. The desirable region of value combinations is in the bottom right corner of the graph where the number of CIs covering the true value is high and the median rescaled CI length is small. Note that, clearly, more importance should be given to high numbers of CIs covering the true value as it is useless to be very certain about a parameter estimate (indicated by a short CI) while the correct value is not included in the CI. However, even for parameters that are identifiable, we do not expect to obtain a coverage of the true value of 100% since we are considering 95% CIs. Therefore, values of 100 rather tend to hint at non-identifiability. In Fig. 8, we can see that for the majority of the parameters, the triangles representing the value combinations for the SDE model are closer to the desirable region. Only for parameter \(m_0\) (which is not identifiable for either model type), the value combinations are almost the same for both model types. And as we already pointed out for the product \(\theta _2m_0\mathrm {scale}\), the median length of the 95% CIs is smaller for the ODE model; however, a lot fewer CIs cover the true parameter value for the ODE model than for the SDE model. Thus, the result obtained for the SDE model is to be preferred.

We provide further Stan-specific diagnostics in Appendix A.5. Those mostly show poorer values for the sampling output for the SDE model than for the ODE model. This is not surprising as we sample from a much higher-dimensional distribution for the SDE model. We do not consider the poor diagnostics as a disadvantage of the procedure as they provide information that we do not even have for other MCMC algorithms and thus cannot compare to them.

Fig. 8
figure 8

Statistics of posterior samples for the two model types aggregated over 100 simulated trajectories without measurement error. The desirable region of value combinations is in the bottom right corner of the graph

6.2.2 Sampling results for simulated data with measurement error

In this section, we use the same simulated data as in the previous section, but for each of the 100 trajectories, we add multiplicative measurement error with parameter \(\sigma =0.02\). Again, we use Stan to sample from the posterior distributions of the ODE model (6) and the SDE model (7) for each of the simulated trajectories and use the same priors as stated in the previous section. We generate 8 HMC chains of 5000 iterations, discard the first half of the iterations as warm-up, and thus use a posterior samples of size 20,000 in the subsequent analysis.

Table 4 Summary of the Stan output for the ODE model given simulated data with measurement error, the true parameter values that were used to simulate the data; and the prior distributions used for the Stan sampling
Table 5 Summary of the Stan output for the SDE model given simulated data with measurement error, the true parameter values that were used to simulate the data; and the prior distributions used for the Stan sampling
Fig. 9
figure 9

Density estimates of the posterior samples for parameters \(\theta _1\), \(\theta _3\), \(\mathrm {offset}\), and \(\sigma \) for the SDE (blue, lower triangle) and ODE (green, upper triangle) model given simulated data with measurement error. For a detailed description of the figure’s elements, see Figure 6

At first, we again focus on the results for one of the trajectories, namely the trajectory represented by the green dotted line in Fig. 5. Tables 4 and 5 summarize the Stan output of the posterior samples for the ODE and the SDE model, respectively. The parameter \(t_0\) is estimated very accurately based on the posterior sample for the ODE model. Also, the parameter \(\mathrm {offset}\) is well estimated for both model types but with a more narrow 95% CI for the SDE model. The parameter \(\sigma \) is accurately determined for the SDE model as well. For the ODE model, \(\sigma \) is again overestimated. Figure 9 visualizes the components of the posterior samples for parameters \(\theta _1\), \(\theta _3\), \(\mathrm {offset}\), and \(\sigma \). Again, the bimodality of the posterior with respect to \(\theta _1\) and \(\theta _3\) is apparent for the ODE model and neither the 95% CIs nor the ranges of the sample cover the true parameter values. For the SDE model on the other hand, the distribution is unimodal and the 95% CIs do cover the true parameter values for \(\theta _1\) and \(\theta _3\). However, their 2-dimensional smoothed scatter plot in Fig. 9 does not show a simple elliptic shape (as for the simulated data without measurement error) but almost a banana-like shape. This may also be the reason for the deteriorated sampling efficiency discernible from the low ESSs and higher \(\hat{R}\)-values in Table 5.

Figure 10 visualizes the components of the posterior samples for parameters \(\theta _2\), \(m_0\), \(\mathrm {scale}\), and their products. For the ODE model, again only the product \(\theta _2m_0\mathrm {scale}\) is identifiable in the sense that the corresponding 95% CI is very narrow, the ESS is high, and the \(\hat{R}\)-value is equal to 1. However, the 95% CI again does not cover the true parameter value. For the SDE model, the 95% CI for \(\theta _2m_0\mathrm {scale}\) is broader but it does contain the true value. Also, the ESS is high and the \(\hat{R}\)-value is close to 1. Moreover, the parameters \(\mathrm {scale}\) and \(\theta _2m_0\) have narrow 95% CIs, high ESSs, and \(\hat{R}\)-values close to 1 for the SDE model, and thus, we conclude that they are identifiable. Note that also for \(\theta _2\), \(m_0\mathrm {scale}\), and \(\theta _2\mathrm {scale}\), the 95% CIs are much narrower for the SDE model than for the ODE model (Fig. 10).

In Sect. A.4.1 of the supplementary material, we also include trace plots and further figures of the sampling output for the trajectory displayed in Fig. 5. There, we present the same posterior samples as used in this and the previous subsection. But instead of comparing the posterior samples between the two model types, the posterior samples are compared between the simulated data without and with measurement error for each model type separately. In summary, we find that for the SDE model, the 95% CIs increase for almost all parameters except \(m_0\) for data with measurement error. Whereas for the ODE model, there is hardly any difference for most of the parameters between the posterior samples for the data without and with measurement error since the majority of the parameters is not identifiable anyway. The marginal posterior samples for the parameters \(\mathrm {offset}\), \(t_0\), and \(\theta _2m_0\mathrm {scale}\) are only slightly affected by the measurement error. Only the marginal posterior sample of the measurement error parameter \(\sigma \) is substantially affected and, as expected, consists of higher values for data with measurement error.

Fig. 10
figure 10

Density estimates of the posterior samples for parameters \(\theta _2\), \(m_0\), \(\mathrm {scale}\), and their products for the SDE (blue, lower triangle) and ODE (green, upper triangle) model given simulated data with measurement error. For a detailed description of the figure’s elements, see Fig. 6

Table 6 Statistics of posterior samples for the two model types aggregated over 100 simulated trajectories with measurement error

Table 6 and Fig. 11 display the statistics of the posterior samples aggregated over the 100 simulated trajectories. Similar to the results for the simulated data without measurement error, the median length of the 95% CIs for the SDE model is always smaller than for the ODE model, except for the parameters \(m_0\) and \(\theta _2m_0\mathrm {scale}\) and additionally \(\sigma \) (which was not included for the SDE model in the previous subsection). Again, for the majority of the parameters, the results for the SDE model represented by triangles in Fig. 11 are closer to the desirable region of value combinations in the bottom right corner of the graph, except for the parameters \(m_0\), \(\theta _2m_0\mathrm {scale}\), and \(\mathrm {offset}\). For the parameter \(\mathrm {offset}\), the median CI length is slightly higher for the ODE model than for the SDE model, however, the CIs for the ODE model also contain the true parameter value more often. So for this parameter, the ODE model, for once, shows the preferable result.

Fig. 11
figure 11

Statistics of posterior samples for the two model types aggregated over 100 simulated trajectories with measurement error. The desirable region of value combinations is in the bottom right corner of the graph

6.3 Estimation based on experimental data

Since our results in the previous section have shown that the SDE model yields more reliable parameter estimates (assuming an MJP is an adequate description for the translation kinetics after mRNA transfection) than the ODE model even for those parameters that are identifiable for both model types, we reanalyze the experimental data published in Fröhlich et al. (2018) and described in Sect. 2. For each type of GFP (eGFP and d2eGFP), we randomly select 100 observed trajectories for our analysis and again use Stan to sample from the posterior distributions of the ODE model (6) and the SDE model (7) for each of the trajectories using the same priors as stated in Sect. 6.2.1. We generate 8 HMC chains of 5000 iterations, discard the first half of the iterations as warm-up, and thus use a posterior samples of size 20,000 in the subsequent analysis. For each type of GFP, we first analyze the sampling output for one observed trajectory in detail and then summarize results for all 100 observed trajectories. Here in the main part of the article, we only show the results for the eGFP data and provide the results for the d2eGFP data in Appendix A.4.3. The comparison between the eGFP and the d2eGFP trajectories demonstrate how practical identifiability may differ between seemingly similar datasets. Moreover, we provide further Stan-specific diagnostics in Appendix A.5.

6.3.1 Sampling results for experimental dataset 1 (for eGFP)

Tables 7 and 8 present a summary of the Stan output for the posterior sample of one observed trajectory for the ODE and the SDE model, respectively, and Figs. 12 and 13 compare the density estimates of these two posterior samples. Moreover, we provide trace plots for these two posterior samples in Section A.4.2 in the supplementary material. The results look qualitatively very similar (almost identical) to those obtained for the simulated data with measurement error in Sect. 6.2.2. Therefore, we do not repeat the detailed description but only point out that the range of values sampled for the parameters \(\theta _1\) and \(\theta _3\) for the SDE model is slightly smaller for the experimental trajectory here. Thus, we do not see the banana-like shape in the two-dimensional smoothed scatter plot of the two parameters for the SDE model in Fig. 12 as for the simulated trajectory in Fig. 9 and the sampling efficiency increases as indicated by higher ESSs and lower \(\hat{R}\) values for the two parameters in Table 8.

Table 7 Summary of the Stan output for the ODE model given experimental data for eGFP and the prior distributions used for the sampling. c.v. denotes the coefficient of variation and the columns headed by percentages contain the quantiles of the respective percentage value
Table 8 Summary of the Stan output for the SDE model given experimental data for eGFP and the prior distributions used for the sampling. The initial time point \(t_0\) is not estimated here, but predetermined based on the mean estimate of the sample for the ODE model
Fig. 12
figure 12

Density estimates of the posterior samples for parameters \(\theta _1\), \(\theta _3\), \(\mathrm {offset}\), and \(\sigma \) for the SDE (blue, lower triangle) and ODE (green, upper triangle) model given experimental data for eGFP. Diagonal panels: Marginal densities for the respective parameter and boxplots showing the 95% CI as box, the range of the sample as whiskers, and the median as thick black line. Off-diagonal panels: Smoothed scatter plots of the two-dimensional projections of the samples where darker hues signify higher density values

Fig. 13
figure 13

Density estimates of the posterior samples for parameters \(\theta _2\), \(m_0\), \(\mathrm {scale}\), and their products for the SDE (blue, lower triangle) and ODE (green, upper triangle) model given experimental data for eGFP. For a detailed description of the figure’s elements, see Fig. 12

The statistics of posterior samples aggregated for 100 experimental trajectories for eGFP in Table 9 are also qualitatively similar to those for the simulated trajectories in Table 6. For the majority of the parameters, the median length of the 95% CI is smaller for the posterior samples for the SDE model than for those for the ODE model. Only for parameters \(\theta _1\), \(\theta _2\), and \(\theta _2 m_0 \mathrm {scale}\), this is not the case. Note in particular that for the parameters \(\theta _2m_0\) and \(\mathrm {scale}\), which are non-identifiable for the ODE model (also apparent from the very long CIs here), the median length of the 95% CI for the SDE model is again much smaller compared to that of the ODE and to that of the prior. This indicates that these two parameters are identifiable for the SDE model also for the experimental data. That the uncertainty of the parameter estimate for \(\theta _2m_0\mathrm {scale}\) is greater for the SDE than for the ODE model is consistent with our results for the simulated data. The parameter \(\theta _2\) is considered to be non-identifiable for both model types and the difference between the median CI lengths is relatively small. Finally, for parameter \(\theta _1\), we see that the result is more or less the same as for \(\theta _3\) for the ODE model due to the symmetry of the posterior distribution with respect to these two parameters. Whereas for the SDE model there is no symmetry and there is more variance in the posterior samples with respect to \(\theta _1\) than to \(\theta _3\) (indicated by a greater median CI length). The smaller median CI length of \(\theta _1\) for the ODE model compared to the SDE model is due to the fact that for many of the observed trajectories the values of \(\theta _1\) and \(\theta _3\) seem to be very close together. In this case, the posterior distribution of the ODE model appears to be unimodal (as the two modes overlap) and the posterior variance with respect to the two parameters is small (and equal due to the symmetry). Thus, overall this variance is smaller than the posterior variance with respect to \(\theta _1\) for the SDE model.

Table 9 Statistics of posterior samples aggregated for 100 experimental trajectories for eGFP

7 Discussion and conclusion

We have modeled the translation kinetics after mRNA transfection using a two-dimensional Itô diffusion process described by an SDE and compared this modeling approach to one using ODEs. We have studied the parameter identifiability for both modeling approaches for the case that we observe a fluorescence signal which we assume to be a linear transformation of the amount of protein molecules (corrupted by multiplicative measurement error). Our analysis using the structural identifiability of the moment-equations-based surrogate model and a simulation-based assessment suggest that the SDE model might lead to better parameter identifiability. The most systematic approach is the one based on the surrogate model and DAISY as suggested by Browning et al. (2020). However, it cannot help us confirm a difference in the parameter identifiability between the SDE and the ODE model, especially because we are interested in the parameter identifiability based on a single observed trajectory and the moment-equations-based approach assumes that we are able to observe the first and the second moment of the fluorescence signal. Even when we take into account that we have several observed trajectories available from the experiment, these do not provide information about the moments because the initial time point \(t_0\) of mRNA release is different for every trajectory and also for the other parameters, in particular for \(m_0\), assuming that they are equal for all observed cells does not seem reasonable.

By simulating from the SDE model, we were able to assess the differences in the variation within individual trajectories for different parameter combinations. Again our results suggest that the SDE model provides better parameter identifiability. Unlike for the ODE model, the degradation rates \(\theta _1\) and \(\theta _3\) for the SDE model appear to be structurally globally identifiable. Also, \(\mathrm {scale}\) and the product \(\theta _2m_0\) seem to be structurally identifiable, but the individual parameters \(\theta _2\) and \(m_0\) do not. Identifiable parameter combinations sometimes hint towards effective reparameterisations for more efficient sampling, but this was not the case here. While the simple simulation approach worked out well for the model considered here, one of its weak points is, of course, the somewhat subjective visual assessment of the variation within trajectories. A more quantitative approach to this would be to simulate a large number of trajectories (with very small time step) for every considered parameter combination, to approximate the quadratic variation for each trajectory, and then, to compare these values between individual trajectories started with the same seed for different parameter combinations and to compare also the distributions of these values for different parameter combinations. Another drawback of both simulation-based approaches is the fact that the analysis is based on a finite set of parameter combinations that can be considered; and thus, drawing general conclusions for the entire parameter space may be problematic.

Moreover, we have assessed the practical parameter identifiability for both model types by sampling from the parameter posterior distribution given simulated data with and without measurement error and the experimental data published in Fröhlich et al. (2018). Section A.4.1 provides additional sampling results for the first case. We observe that CIs and ranges of posterior samples become visibly larger in the presence of measurement error for some but not all parameters. While our focus was on inference from individual fluorescence trajectories, using SDE mixed-effect modeling as done in Wiqvist et al. (2021) would be a meaningful extension, especially e. g. to obtain a common estimate for the parameter \(\mathrm {scale}\). We found that the parameters \(\theta _1\) and \(\theta _3\) are indeed globally identifiable for the SDE model given individual trajectories, unlike for the ODE model. And not only the product \(\theta _2m_0\mathrm {scale}\) but also the parameter \(\mathrm {scale}\) and the product \(\theta _2m_0\) are globally identifiable for the SDE model. Moreover, for the simulated data, the 95% CIs for the identifiable parameters for the SDE model covered the true parameter value adequately many times. Whereas for the ODE model, the true parameter values for the parameters \(\theta _1\), \(\theta _3\), and \(\theta _2m_0\mathrm {scale}\) were not covered by the 95% CIs for many of the posterior samples and were sometimes not even included in the range of values in the sample. The fact that the parameters \(\theta _1\) and \(\theta _3\) can be adequately determined using the SDE modeling approach given an individual trajectory renders the multi-experiment approach with different mRNA constructs and the computationally intense hierarchical optimization algorithm used in Fröhlich et al. (2018) unnecessary in the case that the determination of these parameters is the main objective. Besides, assuming that an MJP is the most appropriate description of the underlying dynamics, we saw that the estimated parameter values for a single cell trajectory based on the ODE model cannot be trusted even when narrow 95% CIs suggest low uncertainty. This has implications for previous studies using this ODE model (e.g.(Leonhardt et al. 2014)). The ODE-based nonlinear mixed-effect model used by Fröhlich et al. (2018) might have partially resolved the problem by allowing the integration of the datasets for eGFP and d2eGFP, thereby eliminating the symmetry. Yet, as the SDE model provides overall more appropriate results for the fitting of individual trajectories, an SDE-based nonlinear mixed-effect model would probably be an even better alternative.

While the SDE model is clearly superior in terms of the information that we are able to extract from a single trajectory about the parameters that determine the dynamics of the underlying process, it has nevertheless several disadvantages. First of all, we were not able to include the estimation of the initial time point \(t_0\) of mRNA release into the Stan sampling procedure due to technical/methodological limitations. Other sampling approaches such as particle MCMC (Golightly and Wilkinson 2011) might alleviate this problem, but to our knowledge, no examples of inferring an unknown time point for SDE models have been investigated so far and would thus require further work. Another drawback of the SDE model are the higher computational costs as we need to sample from a higher-dimensional distribution (due to the random process values) than for the ODE model. For the SDE model, the sampling in our study takes on average almost 5.5 hours while for the ODE model, it averages at about 20 minutes. Note, however, that for the ODE model, the NUTS method implemented in Stan generated only chains that sample only from one of the posterior modes. Therefore, one might argue that sampling did not properly converge in this example. In general, estimation procedures for SDE models are more complex and unlike for ODE models, publicly available software tools are rare and usually not generally applicable. There is a clear need to further develop such tools for SDE models in order to harness their full potential, especially with regard to better identifiability of kinetic parameters. On the other hand, combining both modeling approaches as we have done here by first determining \(t_0\) based on the ODE model and then estimating the kinetic parameters based on the SDE model is certainly also meaningful. Overall, our study clearly demonstrates the relevance of considering different modeling approaches and of selecting the appropriate one. We expect to see stochastic models being used more frequently as increasing computing power becomes available and facilitates inference for such models.