Skip to main content
Log in

Least Squares Estimation in Stochastic Biochemical Networks

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

The paper presents results on the asymptotic properties of the least-squares estimates (LSEs) of the reaction constants in mass-action, stochastic, biochemical network models. LSEs are assumed to be based on the longitudinal data from partially observed trajectories of a stochastic dynamical system, modeled as a continuous-time, pure jump Markov process. Under certain regularity conditions on such a process, it is shown that the vector of LSEs is jointly consistent and asymptotically normal, with the asymptotic covariance structure given in terms of a system of ordinary differential equations (ODE). The derived asymptotic properties hold true as the biochemical network size (the total species number) increases, in which case the stochastic dynamical system converges to the deterministic mass-action ODE. An example is provided, based on synthetic as well as RT-PCR data from the retro-transcription network of the LINE1 gene.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For the purpose of current discussion, it is convenient to extend the notion of LSE and define \(\hat{\theta}\) as any solution (possibly nonunique and/or local) of this optimization problem.

References

  • Andersson, H., & Britton, T. (2000). Lecture notes in statistics: Vol. 151. Stochastic epidemic models and their statistical analysis (1st ed.). Berlin: Springer.

    Book  MATH  Google Scholar 

  • Arkin, A., Shen, P., & Ross, J. (1997). A test case of correlation metric construction of a reaction pathway from measurements. Science, 277, 1275–1279.

    Article  Google Scholar 

  • Bain, A., & Crişan, D. (2008). Fundamentals of stochastic filtering (Vol. 60). Berlin: Springer.

    Google Scholar 

  • Baker, S. M., Schallau, K., & Junker, B. H. (2010). Comparison of different algorithms for simultaneous estimation of multiple parameters in kinetic metabolic models. J. Integr. Bioinform., 7(3).

  • Ball, K., Kurtz, T., Popovic, L., & Rempala, G. (2006). Asymptotic analysis of multiscale approximations to reaction networks. Ann. Appl. Probab., 16(4), 1925–1961.

    Article  MathSciNet  MATH  Google Scholar 

  • Billingsley, P. (1999). Convergence of probability measures (Vol. 316). New York: Wiley-Interscience.

    Book  MATH  Google Scholar 

  • Blanchard, S. C., Kim, H. D., Gonzalez, R. L. J., Puglisi, J. D., & Chu, S. (2004). trna dynamics on the ribosome during translation. Proc. Natl. Acad. Sci. USA, 101(35), 12893–12898.

    Article  Google Scholar 

  • Choi, B., & Rempala, G. A. (2012). Inference for discretely observed stochastic kinetic networks with applications to epidemic modeling. Biostatistics, 13(1), 153–165.

    Article  MATH  Google Scholar 

  • Ethier, S. N., & Kurtz, T. G. (1986). Wiley series in probability and mathematical statistics: probability and mathematical statistics: Markov processes. New York: Wiley. Characterization and convergence.

    Book  MATH  Google Scholar 

  • Finkenstädt, B., Heron, E. A., Komorowski, M., Edwards, K., Tang, S., Harper, C. V., Davis, J. R. E., White, M. R. H., Millar, A. J., & Rand, D. A. (2008). Reconstruction of transcriptional dynamics from gene reporter data using differential equations. Bioinformatics, 24(24), 2901–2907.

    Article  Google Scholar 

  • Gillespie, D. T. (1992). A rigorous derivation of the chemical master equation. Physica A, 188, 404–425.

    Article  Google Scholar 

  • Kim, J., Craciun, G., Pantea, C., & Rempala, G. (2011). Statistical model for biochemical networks inference. Commun. Stat., Simul. Comput. doi:10.1080/03610918.2011.633200.

    Google Scholar 

  • Komorowski, M., Costa, M. J., Rand, D. A., & Stumpf, M. P. H. (2011). Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl. Acad. Sci. USA, 108(21), 8645–8650.

    Article  Google Scholar 

  • Kurtz, T. G. (1972). The relationship between stochastic and deterministic models for chemical reactions. J. Chem. Phys., 57(7), 2976–2978.

    Article  Google Scholar 

  • Kurtz, T. G. (1981). Approximation of discontinuous processes by continuous processes. In L. Arnold & R. Lefever (Eds.), Proceedings, Bielefeld conf on stochastic nonlinear systems in physics, chemistry and biology (pp. 22–35). Berlin: Springer.

    Chapter  Google Scholar 

  • Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010). Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA, 107(14), 6286–6291.

    Article  Google Scholar 

  • Margolin, A. A., & Califano, A. (2007). Theory and limitations of genetic network inference from microarray data. Ann. N.Y. Acad. Sci., 1115, 51–72.

    Article  Google Scholar 

  • Masters, J. R. (2002). Hela cells 50 years on: the good, the bad and the ugly. Nat. Rev. Cancer, 2(4), 315–319.

    Article  Google Scholar 

  • McQuarrie, D. A. (1967). Stochastic approach to chemical kinetics. J. Appl. Probab., 4, 413–478.

    Article  MathSciNet  MATH  Google Scholar 

  • Perez, O. D., Krutzik, P. O., & Nolan, G. P. (2004). Flow cytometric analysis of kinase signaling cascades. Methods Mol. Biol., 263, 67–94.

    Google Scholar 

  • Raue, A., Becker, V., Klingmüller, U., & Timmer, J. (2010). Identifiability and observability analysis for experimental design in nonlinear dynamical models. Chaos, 20(4), 045105.

    Article  Google Scholar 

  • Rempala, G. A., Ramos, K. S., & Kalbfleisch, T. (2006). A stochastic model of gene transcription: an application to l1 retrotransposition events. J. Theor. Biol., 242(1), 101–116.

    Article  MathSciNet  Google Scholar 

  • Rempala, G. A., Ramos, K. S., Kalbfleisch, T., & Teneng, I. (2007). Validation of a mathematical model of gene transcription in aggregated cellular systems: application to l1 retrotransposition. J. Comput. Biol., 14(3), 85–95.

    Article  MathSciNet  Google Scholar 

  • Samoilov, M., Arkin, A., & Ross, J. (2001). On the deduction of chemical reaction pathways from measurements of time series of concentrations. Chaos, 11(1), 108–114.

    Article  MATH  Google Scholar 

  • Sassaman, D., Dombroski, B., Moran, J., Kimberland, M., Naas, T., DeBerardinis, T., Gabriel, A., Swergold, G., & Kazazian, S. Jr. (1997). Many human l1 elements are capable of retrotransposition. Nat. Genet., 16, 37–43.

    Article  Google Scholar 

  • Transtrum, M. K., Machta, B. B., & Sethna, J. P. (2011). Geometry of nonlinear least squares with applications to sloppy models and optimization. Phys. Rev. E, Stat. Nonlinear Soft Matter Phys., 83(3 Pt 2), 036701.

    Article  Google Scholar 

  • Wheeler, D. A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y.-J., Makhijani, V., Roth, G. T., Gomes, X., Tartaro, K., Niazi, F., Turcotte, C. L., Irzyk, G. P., Lupski, J. R., Chinault, C., Song, X.-z., Liu, Y., Yuan, Y., Nazareth, L., Qin, X., Muzny, D. M., Margulies, M., Weinstock, G. M., Gibbs, R. A., & Rothberg, J. M. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature, 452(7189), 872–876.

    Article  Google Scholar 

  • Wilkinson, D. J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nat. Rev. Genet., 10(2), 122–133.

    Article  MathSciNet  Google Scholar 

  • Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310), 1102–1104.

    Article  Google Scholar 

  • Yoon, J., & Deisboeck, T. S. (2009). Investigating differential dynamics of the mapk signaling cascade using a multi-parametric global sensitivity analysis. PLoS ONE, 4(2), e4560.

    Article  Google Scholar 

  • Zacharof, A. I., & Butler, A. P. (2004). Stochastic modelling of landfill leachate and biogas production incorporating waste heterogeneity. Model formulation and uncertainty analysis. Waste Manag., 24(5), 453–462.

    Article  Google Scholar 

  • Zamir, E., & Bastiaens, P. I. H. (2008). Reverse engineering intracellular biochemical networks. Nat. Chem. Biol., 4(11), 643–647.

    Article  Google Scholar 

Download references

Acknowledgement

This research was partially funded by US National Science Foundation under the grant DMS-1106485. The author is grateful to both referees and the associate editor for their comments and suggestions which helped him improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grzegorz A. Rempala.

Appendix: Density Dependent Markov Processes

Appendix: Density Dependent Markov Processes

Suppose for each n≥1, Z n ={Z n (t);t≥0} is a continuous-time Markov process on the d-dimensional lattice \(\mathcal{Z}^{d}\) with the jump intensities

$$ q^{(n)}_{z,z+l}=n\beta_l\bigl(n^{-1}z \bigr),\quad z,l\in \mathcal{Z}^d $$

and the transition probabilities

(19)

We assume that the process has a finite number of possible transitions, that is, there exists only a finite number of \(l\in \mathcal{Z}^{d}\) such that sup x β l (x)>0. We also assume that β x (l) is a continuous function of x for each l and that the starting point Z n (0) is nonrandom. Such processes are called density dependent Markov jump processes (DDMJP) since their rates depend on the process density (its state normalized by n). Note that if Y l ={Y l (t);t≥0} (indexed by l) is a collection of independent unit Poisson processes then

$$ Z_n(t)=Z_n(0)+\sum _l lY_l \biggl(n\int_0^t \beta_l\bigl(n^{-1}Z_n(s)\bigr)\,ds \biggr). $$
(20)

Note that the above process satisfies (19) and that under our assumption (2), the relation (3) implies that (20) is asymptotically equivalent to the process with the trajectories (1). Indeed, in the current notation the latest process is seen to have the jump intensities given by

$$ \tilde{q}^{(n)}_{z,z+l}=n\beta_l \biggl(n^{-1}z+O \biggl(\frac{1}{n} \biggr) \biggr) . $$
(21)

In order to analyze the asymptotic behavior of (1), it suffices therefore to consider DDMJP given by (20) (see, e.g., Ethier and Kurtz 1986, Chap. 11).

Let us denote by \(\hat{Y}_{l}\) the centered Poisson processes, that is \(\hat{Y}_{l}(t)=Y_{l}(t)-l\). Also, let \(\bar{Z}_{n}(t)=Z_{n}(t)/n\) and F(x)=∑ l l (x). Then (20) is equivalent to

$$ \bar{Z}_n(t)=\bar{Z}_n(0)+n^{-1}\sum _l l\hat{Y}_l \biggl(n\int _0^t \beta_l\bigl( \bar{Z}_n(s)\bigr) \biggr)+\int_0^t F \bigl(\bar{Z}_n(s)\bigr)\,ds. $$
(22)

The following result of Kurtz (see Andersson and Britton 2000, Chap. 5) establishes SLLN for \(\bar{Z}_{n}(t)\).

Theorem A.1

(SLLN)

Let \(\lim_{n\to\infty} \bar{Z}_{n}(0)=z_{0}\) (non random) and suppose that for any compact \(K\in{ \mathcal{R}}^{d}\) there exists a constant M K such that |F(x)−F(y)|≤M K |xy|, ∀x,yK. Then

$$\lim_{n\to\infty}\sup_{s\le t}\bigl\vert \bar{Z}_n(s)-z(s) \bigr\vert=0 \quad\mbox{\textit{a.s.}}, $$

where z(t) is a unique solution of the integral equation

$$ z(t)=z_0(t)+\int_0^t F\bigl(z(s)\bigr)\,ds. $$
(23)

Thus, the normed jump Markov vector process \(\bar{Z}_{n}\), for a large population n, is approximately equal to the deterministic vector function z defined by (23). It turns out that the deviations between the two are of order \(\sqrt{n}\). To this end, define first

$$W_l^{(n)}(t) = \sqrt{n} \bigl(n^{-1}Y_l(nt) - t\bigr) = n^{-1/2}\hat{Y}(nt). $$

By Donsker’s theorem (Billingsley 1999) it follows that \(W_{l}^{(n)}(t)\) converges weakly to the standard Brownian motion W l . Define now a scaled and centered process \(\bar{Z}_{n}\) as follows:

(24)

Of course, \(v_{n}(0) = \sqrt{n} (\bar{Z}_{n}(0) - z(0)) \), which by assumption is nonrandom. The second equality above is a direct consequence of the definition of \(W_{l}(t), \bar{Z}_{n}\) and z. We can expand the integrand on the far right by Taylor’s theorem, so that

where ∂F=( j F i ) is the matrix function of partial derivatives. From Theorem A.1, we know that \(\bar{Z}_{n}\) converges to z, and since \(W_{l}^{(n)}\) converges to W l , the standard Brownian motion, one might anticipate that V n converges to a process V defined by the integral equation

$$ V(t) = v_0 + \sum_l lW_l \biggl(\int_0^t \beta_l\bigl(z(s)\bigr)\,ds \biggr)+ \int_0^t \partial F\bigl(z(s)\bigr)V(s)\,ds. $$
(25)

This is formally stated as the following theorem due to Kurtz (see Andersson and Britton 2000, Chap. 5) where we set G(x)=∑ l ll β l (x). Note that β l (x)∈ℝ and ll ∈ℝd×d, since l is taken as a column vector.

Theorem A.2

(CLT)

Suppose ∂F is continuous and lim n→∞ v n (0)=v 0 (constant). Then V n V, the process defined in Eq. (25). This process V is a Gaussian vector process with covariance matrix

$$\operatorname {Cov}\bigl(V(t), V(r)\bigr) = \int_0^{r \wedge t} \varPhi(t,s)G\bigl(z(s)\bigr) \bigl(\varPhi(r, s)\bigr)^\top \,ds. $$

Here, Φ is a matrix function defined as the solution of

$$\varPhi^\prime_2(t,s) = -\varPhi(t, s) \partial F\bigl(z(s)\bigr), \qquad\varPhi(s, s) = I, $$

where \(\varPhi^{\prime}_{2}\) denotes the partial derivative with respect to s.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rempala, G.A. Least Squares Estimation in Stochastic Biochemical Networks. Bull Math Biol 74, 1938–1955 (2012). https://doi.org/10.1007/s11538-012-9744-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-012-9744-y

Keywords

Navigation