Skip to main content

A Reaction Network Scheme Which Implements the EM Algorithm

  • Conference paper
  • First Online:
DNA Computing and Molecular Programming (DNA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11145))

Included in the following conference series:

Abstract

A detailed algorithmic explanation is required for how a network of chemical reactions can generate the sophisticated behavior displayed by living cells. Though several previous works have shown that reaction networks are computationally universal and can in principle implement any algorithm, there is scope for constructions that map well onto biological reality, make efficient use of the computational potential of the native dynamics of reaction networks, and make contact with statistical mechanics. We describe a new reaction network scheme for solving a large class of statistical problems. Specifically we show how reaction networks can implement information projection, and consequently a generalized Expectation-Maximization algorithm, to solve maximum likelihood estimation problems in partially-observed exponential families on categorical data. Our scheme can be thought of as an algorithmic interpretation of E. T. Jaynes’s vision of statistical mechanics as statistical inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This situation can arise because only a linear projection is observable. It can also happen because we require a rich family of probability distributions on the space of s points, but don’t want to give away the nice properties of exponential families. We can achieve both by imagining that our observation s comes from projection from a data vector x living in a higher-dimensional space, and then employ an exponential family of probability distributions on this higher-dimensional space.

References

  1. Amari, S.: Information Geometry and Its Applications. AMS, vol. 194. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8

    Book  MATH  Google Scholar 

  2. Andersen, E.B.: Sufficiency and exponential families for discrete sample spaces. J. Am. Stat. Assoc. 65(331), 1248–1255 (1970)

    Article  MathSciNet  Google Scholar 

  3. Anderson, D.F., Craciun, G., Kurtz, T.G.: Product-form stationary distributions for deficiency zero chemical reaction networks. Bull. Math. Biol. 72(8), 1947–1970 (2010)

    Article  MathSciNet  Google Scholar 

  4. Angeli, D., De Leenheer, P., Sontag, E.: A Petri net approach to persistence analysis in chemical reaction networks. In: Queinnec, I., Tarbouriech, S., Garcia, G., Niculescu, S.-I. (eds.) Biology and Control Theory: Current Challenges. LNCIS, vol. 357, pp. 181–216. Springer, Berlin / Heidelberg (2007). https://doi.org/10.1007/978-3-540-71988-5_9

    Chapter  Google Scholar 

  5. Angeli, D., De Leenheer, P., Sontag, E.D.: A Petri net approach to the study of persistence in chemical reaction networks. Math. Biosci. 210(2), 598–618 (2007)

    Article  MathSciNet  Google Scholar 

  6. Baez, J., Stay, M.: Algorithmic thermodynamics. Math. Struct. Comput. Sci. 22(5), 771–787 (2012)

    Article  MathSciNet  Google Scholar 

  7. Birch, M.W.: Maximum likelihood in three-way contingency tables. J. R. Stat. Soc. Ser. B 25, 220–233 (1963)

    MathSciNet  MATH  Google Scholar 

  8. Buisman, H.J., ten Eikelder, H.M.M., Hilbers, P.A.J., Liekens, A.M.L., Liekens, A.M.L.: Computing algebraic functions with biochemical reaction networks. Artif. Life 15(1), 5–19 (2009)

    Article  Google Scholar 

  9. Cardelli, L., Kwiatkowska, M., Whitby, M.: Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17(1), 109–130 (2018)

    Article  MathSciNet  Google Scholar 

  10. Cencov, N.N.: Statistical Decision Rules and Optimal Inference. Translation of Mathematical Monographs, vol. 53. American Mathematical Society, Providence (2000)

    Google Scholar 

  11. Chen, H.-L., Doty, D., Soloveichik, D.: Deterministic function computation with chemical reaction networks. Nat. Comput. 13(4), 517–534 (2014)

    Article  MathSciNet  Google Scholar 

  12. Csiszár, I., Matus, F.: Information projections revisited. IEEE Trans. Inf. Theory 49(6), 1474–1490 (2003)

    Article  MathSciNet  Google Scholar 

  13. Csiszár, I., Shields, P.C.: Information theory and statistics: a tutorial. Found. Trends Commun. Inf. Theory 1(4), 417–528 (2004)

    Article  Google Scholar 

  14. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.), 1–38 (1977)

    Google Scholar 

  15. Desvillettes, L., Fellner, K., Tang, B.Q.: Trend to equilibrium for reaction-diffusion systems arising from complex balanced chemical reaction networks. SIAM J. Math. Anal. 49(4), 2666–2709 (2017)

    Article  MathSciNet  Google Scholar 

  16. Feinberg, M.: On chemical kinetics of a certain class. Arch. Ration. Mech. Anal. 46, 1–41 (1972)

    Article  MathSciNet  Google Scholar 

  17. Feinberg, M.: Lectures on chemical reaction networks (1979). http://www.che.eng.ohio-state.edu/~FEINBERG/LecturesOnReactionNetworks/

  18. Gopalkrishnan, M.: Catalysis in reaction networks. Bull. Math. Biol. 73(12), 2962–2982 (2011)

    Article  MathSciNet  Google Scholar 

  19. Gopalkrishnan, M.: A scheme for molecular computation of maximum likelihood estimators for log-linear models. In: Rondelez, Y., Woods, D. (eds.) DNA 2016. LNCS, vol. 9818, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43994-5_1

    Chapter  MATH  Google Scholar 

  20. Gopalkrishnan, M., Miller, E., Shiu, A.: A geometric approach to the global attractor conjecture. SIAM J. Appl. Dyn. Syst. 13(2), 758–797 (2014)

    Article  MathSciNet  Google Scholar 

  21. Hjelmfelt, A., Weinberger, E.D., Ross, J.: Chemical implementation of neural networks and turing machines. Proc. Natl. Acad. Sci. 88(24), 10983–10987 (1991)

    Article  Google Scholar 

  22. Horn, F.J.M.: Necessary and sufficient conditions for complex balancing in chemical kinetics. Arch. Ration. Mech. Anal. 49, 172–186 (1972)

    Article  MathSciNet  Google Scholar 

  23. Horn, F.J.M.: The dynamics of open reaction systems. In: Mathematical Aspects of Chemical and Biochemical Problems and Quantum Chemistry. Proceedings of Symposia in Applied Mathematics, vol. VIII, New York (1974)

    Google Scholar 

  24. Ikeda, S., Tanaka, T., Amari, S.: Information geometry of turbo and low-density parity-check codes. IEEE Trans. Inf. Theory 50(6), 1097–1114 (2004)

    Article  MathSciNet  Google Scholar 

  25. Ikeda, S., Tanaka, T., Amari, S.: Stochastic reasoning, free energy, and information geometry. Neural Comput. 16(9), 1779–1810 (2004)

    Article  Google Scholar 

  26. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)

    Article  MathSciNet  Google Scholar 

  27. MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  28. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)

    MATH  Google Scholar 

  29. Miller, E.: Theory and applications of lattice point methods for binomial ideals. In: Fløystad, G., Johnsen, T., Knutsen, A. (eds.) Combinatorial Aspects of Commutative Algebra and Algebraic Geometry, pp. 99–154. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19492-4_8

    Chapter  Google Scholar 

  30. Napp, N.E., Adams, R.P.: Message passing inference with chemical reaction networks. In: Advances in Neural Information Processing Systems, pp. 2247–2255 (2013)

    Google Scholar 

  31. Oishi, K., Klavins, E.: Biomolecular implementation of linear I/O systems. Syst. Biol. IET 5(4), 252–260 (2011)

    Article  Google Scholar 

  32. Pachter, L., Sturmfels, B.: Algebraic Statistics for Computational Biology, vol. 13. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  33. Poole, W., et al.: Chemical Boltzmann machines. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 210–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_14

    Chapter  Google Scholar 

  34. Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332(6034), 1196–1201 (2011)

    Article  Google Scholar 

  35. Qian, L., Winfree, E., Bruck, J.: Neural network computation with DNA strand displacement cascades. Nature 475(7356), 368–372 (2011)

    Article  Google Scholar 

  36. Sarpeshkar, R.: Analog synthetic biology. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 372(2012), 20130110 (2014)

    Article  Google Scholar 

  37. Soloveichik, D., Cook, M., Winfree, E., Bruck, J.: Computation with finite stochastic chemical reaction networks. Nat. Comput. 7(4), 615–633 (2008)

    Article  MathSciNet  Google Scholar 

  38. Tribus, M., McIrvine, E.C.: Energy and information. Sci. Am. 225(3), 179–190 (1971)

    Article  Google Scholar 

  39. Van Kampen, N.G.: Stochastic Processes in Physics and Chemistry, vol. 1. Elsevier, New York (1992)

    MATH  Google Scholar 

  40. Virinchi, M.V., Behera, A., Gopalkrishnan, M.: A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 82–97. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_6

    Chapter  MATH  Google Scholar 

  41. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    Article  Google Scholar 

  42. Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine, vol. 25. MIT Press, Cambridge (1961)

    MATH  Google Scholar 

  43. Zechner, C., Seelig, G., Rullan, M., Khammash, M.: Molecular circuits for dynamic noise filtering. Proc. Natl. Acad. Sci. 113(17), 4729–4734 (2016)

    Article  Google Scholar 

  44. Zellner, A.: Optimal information processing and Bayes’s theorem. Am. Stat. 42(4), 278–280 (1988)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoj Gopalkrishnan .

Editor information

Editors and Affiliations

Appendix A

Appendix A

Proof

(Proof of Theorem 2). (1) Fix \(\alpha \in \mathbb {R}^n_{>0}\). We first prove uniqueness: suppose for contradiction that there are at least two points of intersection \(\alpha ^*_1,\alpha ^*_2\) of the polytope \((\alpha + V^\perp )\cap \mathbb {R}^n_{\ge 0}\) with the hypersurface \(e^V\). Since \(\alpha ^*_1,\alpha ^*_2\in e^V\), we have \(\log \alpha _1^* - \log \alpha _2^*\in V\). Since \(\alpha - \alpha _1^*\in V^\perp \), we have \((\alpha -\alpha _1^*)\cdot (\log \alpha _1^* - \log \alpha _2^*)=0\). Then by the Pythagorean theorem, \(D(\alpha \Vert \alpha ^*_1) = D(\alpha \Vert \alpha ^*_2) + D(\alpha ^*_2\Vert \alpha ^*_1)\) which implies \(D(\alpha \Vert \alpha ^*_1)\ge D(\alpha \Vert \alpha ^*_2)\). By a symmetric argument, \(D(\alpha \Vert \alpha ^*_2)\ge D(\alpha \Vert \alpha ^*_1)\) and we conclude \(D(\alpha \Vert \alpha ^*_2)= D(\alpha \Vert \alpha ^*_1)\). In particular, \(D(\alpha ^*_2\Vert \alpha ^*_1)=0\) which implies \(\alpha ^*_1=\alpha ^*_2\) by Note 1.

To prove that there exists at least one point of intersection, and to show (2), fix \(\beta \in e^V\). We will show that the E-Projection \(\alpha ^*\) of \(\beta \) to \((\alpha +V^\perp )\cap \mathbb {R}^n_{\ge 0}\) belongs to \(e^V\). This point \(\alpha ^*\) exists since \(D(x\Vert \beta )\) is continuous, and hence attains its minimum over the compact set \((\alpha +V^\perp )\cap \mathbb {R}^n_{\ge 0}\). Further, because \(\alpha ^*\) is an infimum, we need that \(\lim _{\lambda \rightarrow 0} \frac{d f((1-\lambda )\alpha ^* + \lambda \alpha )}{d\lambda }=0\). That is, \((\alpha - \alpha ^*)\log \frac{\alpha ^*}{\beta }=0\), which implies that \(\alpha ^*\in e^V\) since \(\alpha \) could have been replaced by any other arbitrary point of \((\alpha +V^\perp )\cap \mathbb {R}^n_{>0}\).

(3) now follows because \(\alpha ^*\in e^V\) implies \(D(\alpha \Vert \alpha ^*) + D(\alpha ^*\Vert \beta ) = D(\alpha \Vert \beta )\) for all \(\beta \in e^V\), hence \(\alpha ^*\) is the M-Projection of \(\alpha \) to \(e^V\).

Proof

(Proof of Theorem 6).

  1. (1)

    From the chain rule, \(\dot{D}(x(t) \Vert y_A\circ \theta (t)) = ( \nabla _x D\cdot \dot{x} + \nabla _\theta D\cdot \dot{\theta })|_{(x(t),\theta (t))}\). From Theorem 4, the first term is nonpositive with equality iff x(t) is the E-Projection of \(y(\theta (t))\) onto \(H_{x_0}\). From Theorem 5, the second term is nonpositive with equality iff \(y(\theta (t))\) is the M-Projection of x(t) onto \(y_A(\mathbb {R}^m)\). Hence \(dD(x(t) \Vert y_A\circ \theta (t))/dt \le 0\) with equality iff both x(t) is the E-Projection of \(y_A\circ \theta (t)\) to \(H_{x_0}\) and \(y_A\circ \theta (t)\) is the M-Projection of x(t) to \(y_A(\mathbb {R}^m)\).

  2. (2)

    Since \(D(x(t) \Vert y_A\circ \theta (t))\) has a lower bound and \(dD(x(t) \Vert y_A\circ \theta (t))/dt \le 0\), eventually \(dD(x(t) \Vert y_A\circ \theta (t))/dt = 0\) at which point, by the above argument, both the E-Projection and M-Projection subnetworks are stationary, so that \(\dot{x}=0\) and \(\dot{\theta }=0\). Hence the limit \((\hat{x},\hat{\theta })=\lim _{t\rightarrow \infty }(x(t), \theta (t))\) exists.

  3. (3)

    follows since \(\nabla _\theta D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }} = \dot{\theta }(t)/\theta (t)|_{\hat{x},\hat{\theta }}=0\) when \(\hat{\theta }\in \mathbb {R}^\varTheta _{>0}\).

  4. (4)

    \(\nabla _x D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }} = \log \left( \frac{\hat{x}}{y_A(\hat{\theta })}\right) \). By (1), the point \(\hat{x}\) is the E-Projection of \(y_A(\hat{\theta })\) to \(H_{x_0}\). Hence by Theorem 2, the point \(\hat{x}\) is the Birch point of \(x_0\) relative to the affine space \(\log y_A(\mathbb {R}^m)\), so that \((x - \hat{x})\log \left( \frac{\hat{x}}{y_A(\hat{\theta })}\right) =0\) for all \(x\in H_{x_0}\). Hence the gradient \(\nabla _x D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }}\) is perpendicular to \(H_{R_\mathcal {B}}\).

Example 8

Consider \(A=\left( \begin{array}{ccc}2&{}1&{}0\\ 0&{}1&{}2\end{array}\right) \) and \(\mathcal {S}=\left( \begin{array}{rrr}1&{}0&{}1\\ 1&{}1&{}1\end{array}\right) \). The vector \(\left( \begin{array}{r}1 \\ 0 \\ -1 \end{array}\right) \) spans \(\ker \mathcal {S}\). The corresponding reaction network is

$$\begin{aligned} \begin{array}{lllllll} X_1\rightarrow X_1 + 2\theta _1 &{}&{}2\theta _1\rightarrow 0 &{}&{}X_2\rightarrow X_2+\theta _1+\theta _2 &{}&{}\quad \theta _1+\theta _2\rightarrow 0\\ X_3\rightarrow X_3+2\theta _2 &{}&{} 2\theta _2\rightarrow 0 &{}&{}X_1 + 2\theta _2\rightarrow X_3 + 2\theta _2 &{}&{}X_3 + 2\theta _1\rightarrow X_1 + 2\theta _1 \end{array} \end{aligned}$$

Here the concentration of \(X_2\) remains invariant with time. Let c be the initial concentration of \(X_2\). If \(c < 1/3\) then the system admits two stable equilibria and one unstable equilibrium. The points \((y_1,c,y_2,\sqrt{y_1},\sqrt{y_2})\) and \((y_2,c,y_1,\sqrt{y_2},\sqrt{y_1})\) are the stable equilibria where \(y_1=\frac{1-c}{2}+\frac{\sqrt{(1-3c)(1+c)}}{2}\) and \(y_2=\frac{1-c}{2}-\frac{\sqrt{(1-3c)(1+c)}}{2}\), and \(\left( \frac{1-c}{2},c,\frac{1-c}{2},\sqrt{\frac{1}{3}},\sqrt{\frac{1}{3}}\right) \) is the unstable equilibrium. On the other hand, if \(c\ge 1/3\) then there is only one equilibrium point at \(\left( \frac{1-c}{2},c,\frac{1-c}{2},\sqrt{\frac{1}{3}},\sqrt{\frac{1}{3}}\right) \), and this point is stable.

Example 9

Consider \(A=\left( \begin{array}{ccc}2&{}1&{}0\\ 0&{}1&{}2\\ \end{array}\right) \) and \(\mathcal {S}=\left( \begin{array}{rrr}1&{}-1&{}0\\ 1&{}1&{}1\end{array}\right) \). The vector \(\left( \begin{array}{r}1 \\ 1 \\ -2 \end{array}\right) \) spans \(\ker \mathcal {S}\). The corresponding reaction network is

$$\begin{aligned} \begin{array}{llllllll} &{}X_1\rightarrow X_1 + 2\theta _1, &{}&{}2\theta _1\rightarrow 0, &{}&{}X_2\rightarrow X_2+\theta _1+\theta _2, &{}&{}X_1 + X_2+3\theta _2\rightarrow 2X_3 +3\theta _2\\ &{}X_3\rightarrow X_3+2\theta _2, &{}&{} 2\theta _2\rightarrow 0, &{}&{}\theta _1+\theta _2\rightarrow 0,&{}&{} 2X_3 + 3\theta _1\rightarrow X_1 + X_2+3\theta _1 \end{array} \end{aligned}$$

Here the set \(\{X_1,X_2,\theta _1\}\) is a critical siphon. If we start at the initial concentrations \(x_1=0.05,x_2=0.05,x_3=0.9,\theta _1=0.1,\theta _2=1.0\), then the system converges to \(x_1=0,x_2=0,x_3=1,\theta _1=0,\theta _2=1\), hence this system is not persistent. This provides one explanation for this data: all the outcomes were of type \(X_3\). If instead we start at \(\theta _1=0.5,\theta _2=1.0\) and the same x concentrations, then the system converges to \(x_1=x_2=x_3=1/3,\theta _1=\theta _2=1/\sqrt{c}\). This provides a different explanation for the same data: all three outcomes have occurred equally frequently.

Example 10

Boltzmann machines are a popular model in machine learning. Formally a Boltzmann machine is a graph \(G=(V,E)\), each of whose nodes can be either 1 or 0. One associates to every configuration \(s \in \{0,1\}^V\) of the Boltzmann machine an energy \(E(s)=-\sum _i b_i s_i-\sum _{ij}w_{ij}s_is_j\). The probability of the Boltzmann machine being in configuration s is given by the exponential family \(P(s;b,w)\propto \exp (-E(s))\). Boltzmann machines can be used to do inference conditioned on partial observations, and learning of the maximum likelihood values of the parameters \(b_i,w_{ij}\) can be done by a stochastic gradient descent.

Our EM scheme can be used to implement the learning rule of arbitrary Boltzmann machines in chemistry. We illustrate the construction on the 3-node Boltzmann machine with \(V=\{x_1,x_2,x_3\}\):

figure b

with biases \(b_1,b_2,b_3\) and weights \(w_{12}\) and \(w_{23}\). We will work with parameters \(\theta _i=\exp (b_i)\) and \(\theta _{ij}=\exp (w_{ij})\). The design matrix \(A=(a_{ij})_{5\times 8}\) is

and the corresponding exponential model \(y_A:\mathbb {R}^5\rightarrow \mathbb {R}^8_{>0}\) sends \(\theta = (\theta _1,\theta _2,\theta _3,\theta _{ 12},\theta _{23}) \longmapsto (\theta ^{a_{.1}},\theta ^{a_{.2}},\dots ,\theta ^{a_{.8}})\). If the node \(x_2\) is hidden then the observation matrix \(\mathcal S\) is

Our EM scheme yields the reaction network

Suppose we observe a marginal distribution (0.24, 0.04, 0.17, 0.55) on the visible nodes \(x_1,x_3\). To solve for the maximum likelihood \(\hat{\theta }\), we can initialize the system with \(X_{000}=0.24,X_{001}=0.04,X_{010}=0,X_{011}=0,X_{100}=0.17,X_{101}=0.55,X_{110}=0,X_{111}=0\) and all \(\theta \)’s initialized to 1, the system reaches steady state at \(\hat{\theta }_1=0.5176,\hat{\theta }_2=0.0018,\hat{\theta }_3=0.3881,\hat{\theta }_{12}=0.8246,\hat{\theta }_{23}=0.7969, \hat{X}_{000}=0.2391,\hat{X}_{001}=0.0389, \hat{X}_{010}=0.0009, \hat{X}_{011}=0.011, \hat{X}_{100}=0.1695, \hat{X}_{101}=0.5487, \hat{X}_{110}=0.0005, \hat{X}_{111}=0.0013\).

figure c

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Viswa Virinchi, M., Behera, A., Gopalkrishnan, M. (2018). A Reaction Network Scheme Which Implements the EM Algorithm. In: Doty, D., Dietz, H. (eds) DNA Computing and Molecular Programming. DNA 2018. Lecture Notes in Computer Science(), vol 11145. Springer, Cham. https://doi.org/10.1007/978-3-030-00030-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00030-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00029-5

  • Online ISBN: 978-3-030-00030-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics