A Reaction Network Scheme Which Implements the EM Algorithm

Viswa Virinchi, Muppirala; Behera, Abhishek; Gopalkrishnan, Manoj

doi:10.1007/978-3-030-00030-1_12

Muppirala Viswa Virinchi¹⁵,
Abhishek Behera¹⁵ &
Manoj Gopalkrishnan¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11145))

Included in the following conference series:

International Conference on DNA Computing and Molecular Programming

958 Accesses
5 Citations

Abstract

A detailed algorithmic explanation is required for how a network of chemical reactions can generate the sophisticated behavior displayed by living cells. Though several previous works have shown that reaction networks are computationally universal and can in principle implement any algorithm, there is scope for constructions that map well onto biological reality, make efficient use of the computational potential of the native dynamics of reaction networks, and make contact with statistical mechanics. We describe a new reaction network scheme for solving a large class of statistical problems. Specifically we show how reaction networks can implement information projection, and consequently a generalized Expectation-Maximization algorithm, to solve maximum likelihood estimation problems in partially-observed exponential families on categorical data. Our scheme can be thought of as an algorithmic interpretation of E. T. Jaynes’s vision of statistical mechanics as statistical inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This situation can arise because only a linear projection is observable. It can also happen because we require a rich family of probability distributions on the space of s points, but don’t want to give away the nice properties of exponential families. We can achieve both by imagining that our observation s comes from projection from a data vector x living in a higher-dimensional space, and then employ an exponential family of probability distributions on this higher-dimensional space.

References

Amari, S.: Information Geometry and Its Applications. AMS, vol. 194. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
Book MATH Google Scholar
Andersen, E.B.: Sufficiency and exponential families for discrete sample spaces. J. Am. Stat. Assoc. 65(331), 1248–1255 (1970)
Article MathSciNet Google Scholar
Anderson, D.F., Craciun, G., Kurtz, T.G.: Product-form stationary distributions for deficiency zero chemical reaction networks. Bull. Math. Biol. 72(8), 1947–1970 (2010)
Article MathSciNet Google Scholar
Angeli, D., De Leenheer, P., Sontag, E.: A Petri net approach to persistence analysis in chemical reaction networks. In: Queinnec, I., Tarbouriech, S., Garcia, G., Niculescu, S.-I. (eds.) Biology and Control Theory: Current Challenges. LNCIS, vol. 357, pp. 181–216. Springer, Berlin / Heidelberg (2007). https://doi.org/10.1007/978-3-540-71988-5_9
Chapter Google Scholar
Angeli, D., De Leenheer, P., Sontag, E.D.: A Petri net approach to the study of persistence in chemical reaction networks. Math. Biosci. 210(2), 598–618 (2007)
Article MathSciNet Google Scholar
Baez, J., Stay, M.: Algorithmic thermodynamics. Math. Struct. Comput. Sci. 22(5), 771–787 (2012)
Article MathSciNet Google Scholar
Birch, M.W.: Maximum likelihood in three-way contingency tables. J. R. Stat. Soc. Ser. B 25, 220–233 (1963)
MathSciNet MATH Google Scholar
Buisman, H.J., ten Eikelder, H.M.M., Hilbers, P.A.J., Liekens, A.M.L., Liekens, A.M.L.: Computing algebraic functions with biochemical reaction networks. Artif. Life 15(1), 5–19 (2009)
Article Google Scholar
Cardelli, L., Kwiatkowska, M., Whitby, M.: Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17(1), 109–130 (2018)
Article MathSciNet Google Scholar
Cencov, N.N.: Statistical Decision Rules and Optimal Inference. Translation of Mathematical Monographs, vol. 53. American Mathematical Society, Providence (2000)
Google Scholar
Chen, H.-L., Doty, D., Soloveichik, D.: Deterministic function computation with chemical reaction networks. Nat. Comput. 13(4), 517–534 (2014)
Article MathSciNet Google Scholar
Csiszár, I., Matus, F.: Information projections revisited. IEEE Trans. Inf. Theory 49(6), 1474–1490 (2003)
Article MathSciNet Google Scholar
Csiszár, I., Shields, P.C.: Information theory and statistics: a tutorial. Found. Trends Commun. Inf. Theory 1(4), 417–528 (2004)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.), 1–38 (1977)
Google Scholar
Desvillettes, L., Fellner, K., Tang, B.Q.: Trend to equilibrium for reaction-diffusion systems arising from complex balanced chemical reaction networks. SIAM J. Math. Anal. 49(4), 2666–2709 (2017)
Article MathSciNet Google Scholar
Feinberg, M.: On chemical kinetics of a certain class. Arch. Ration. Mech. Anal. 46, 1–41 (1972)
Article MathSciNet Google Scholar
Feinberg, M.: Lectures on chemical reaction networks (1979). http://www.che.eng.ohio-state.edu/~FEINBERG/LecturesOnReactionNetworks/
Gopalkrishnan, M.: Catalysis in reaction networks. Bull. Math. Biol. 73(12), 2962–2982 (2011)
Article MathSciNet Google Scholar
Gopalkrishnan, M.: A scheme for molecular computation of maximum likelihood estimators for log-linear models. In: Rondelez, Y., Woods, D. (eds.) DNA 2016. LNCS, vol. 9818, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43994-5_1
Chapter MATH Google Scholar
Gopalkrishnan, M., Miller, E., Shiu, A.: A geometric approach to the global attractor conjecture. SIAM J. Appl. Dyn. Syst. 13(2), 758–797 (2014)
Article MathSciNet Google Scholar
Hjelmfelt, A., Weinberger, E.D., Ross, J.: Chemical implementation of neural networks and turing machines. Proc. Natl. Acad. Sci. 88(24), 10983–10987 (1991)
Article Google Scholar
Horn, F.J.M.: Necessary and sufficient conditions for complex balancing in chemical kinetics. Arch. Ration. Mech. Anal. 49, 172–186 (1972)
Article MathSciNet Google Scholar
Horn, F.J.M.: The dynamics of open reaction systems. In: Mathematical Aspects of Chemical and Biochemical Problems and Quantum Chemistry. Proceedings of Symposia in Applied Mathematics, vol. VIII, New York (1974)
Google Scholar
Ikeda, S., Tanaka, T., Amari, S.: Information geometry of turbo and low-density parity-check codes. IEEE Trans. Inf. Theory 50(6), 1097–1114 (2004)
Article MathSciNet Google Scholar
Ikeda, S., Tanaka, T., Amari, S.: Stochastic reasoning, free energy, and information geometry. Neural Comput. 16(9), 1779–1810 (2004)
Article Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)
Article MathSciNet Google Scholar
MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
MATH Google Scholar
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)
MATH Google Scholar
Miller, E.: Theory and applications of lattice point methods for binomial ideals. In: Fløystad, G., Johnsen, T., Knutsen, A. (eds.) Combinatorial Aspects of Commutative Algebra and Algebraic Geometry, pp. 99–154. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19492-4_8
Chapter Google Scholar
Napp, N.E., Adams, R.P.: Message passing inference with chemical reaction networks. In: Advances in Neural Information Processing Systems, pp. 2247–2255 (2013)
Google Scholar
Oishi, K., Klavins, E.: Biomolecular implementation of linear I/O systems. Syst. Biol. IET 5(4), 252–260 (2011)
Article Google Scholar
Pachter, L., Sturmfels, B.: Algebraic Statistics for Computational Biology, vol. 13. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Poole, W., et al.: Chemical Boltzmann machines. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 210–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_14
Chapter Google Scholar
Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332(6034), 1196–1201 (2011)
Article Google Scholar
Qian, L., Winfree, E., Bruck, J.: Neural network computation with DNA strand displacement cascades. Nature 475(7356), 368–372 (2011)
Article Google Scholar
Sarpeshkar, R.: Analog synthetic biology. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 372(2012), 20130110 (2014)
Article Google Scholar
Soloveichik, D., Cook, M., Winfree, E., Bruck, J.: Computation with finite stochastic chemical reaction networks. Nat. Comput. 7(4), 615–633 (2008)
Article MathSciNet Google Scholar
Tribus, M., McIrvine, E.C.: Energy and information. Sci. Am. 225(3), 179–190 (1971)
Article Google Scholar
Van Kampen, N.G.: Stochastic Processes in Physics and Chemistry, vol. 1. Elsevier, New York (1992)
MATH Google Scholar
Virinchi, M.V., Behera, A., Gopalkrishnan, M.: A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 82–97. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_6
Chapter MATH Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
Article Google Scholar
Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine, vol. 25. MIT Press, Cambridge (1961)
MATH Google Scholar
Zechner, C., Seelig, G., Rullan, M., Khammash, M.: Molecular circuits for dynamic noise filtering. Proc. Natl. Acad. Sci. 113(17), 4729–4734 (2016)
Article Google Scholar
Zellner, A.: Optimal information processing and Bayes’s theorem. Am. Stat. 42(4), 278–280 (1988)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

India Institute of Technology Bombay, Mumbai, India
Muppirala Viswa Virinchi, Abhishek Behera & Manoj Gopalkrishnan

Authors

Muppirala Viswa Virinchi
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Behera
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Gopalkrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoj Gopalkrishnan .

Editor information

Editors and Affiliations

University of California, Davis, CA, USA
David Doty
Technical University Munich, Garching, Germany
Hendrik Dietz

Appendix A

Proof

(Proof of Theorem 2). (1) Fix $\alpha \in \mathbb {R}^n_{>0}$. We first prove uniqueness: suppose for contradiction that there are at least two points of intersection $\alpha ^*_1,\alpha ^*_2$ of the polytope $(\alpha + V^\perp )\cap \mathbb {R}^n_{\ge 0}$ with the hypersurface $e^V$. Since $\alpha ^*_1,\alpha ^*_2\in e^V$, we have $\log \alpha _1^* - \log \alpha _2^*\in V$. Since $\alpha - \alpha _1^*\in V^\perp $, we have $(\alpha -\alpha _1^*)\cdot (\log \alpha _1^* - \log \alpha _2^*)=0$. Then by the Pythagorean theorem, $D(\alpha \Vert \alpha ^*_1) = D(\alpha \Vert \alpha ^*_2) + D(\alpha ^*_2\Vert \alpha ^*_1)$ which implies $D(\alpha \Vert \alpha ^*_1)\ge D(\alpha \Vert \alpha ^*_2)$. By a symmetric argument, $D(\alpha \Vert \alpha ^*_2)\ge D(\alpha \Vert \alpha ^*_1)$ and we conclude $D(\alpha \Vert \alpha ^*_2)= D(\alpha \Vert \alpha ^*_1)$. In particular, $D(\alpha ^*_2\Vert \alpha ^*_1)=0$ which implies $\alpha ^*_1=\alpha ^*_2$ by Note 1.

To prove that there exists at least one point of intersection, and to show (2), fix $\beta \in e^V$. We will show that the E-Projection $\alpha ^*$ of $\beta $ to $(\alpha +V^\perp )\cap \mathbb {R}^n_{\ge 0}$ belongs to $e^V$. This point $\alpha ^*$ exists since $D(x\Vert \beta )$ is continuous, and hence attains its minimum over the compact set $(\alpha +V^\perp )\cap \mathbb {R}^n_{\ge 0}$. Further, because $\alpha ^*$ is an infimum, we need that $\lim _{\lambda \rightarrow 0} \frac{d f((1-\lambda )\alpha ^* + \lambda \alpha )}{d\lambda }=0$. That is, $(\alpha - \alpha ^*)\log \frac{\alpha ^*}{\beta }=0$, which implies that $\alpha ^*\in e^V$ since $\alpha $ could have been replaced by any other arbitrary point of $(\alpha +V^\perp )\cap \mathbb {R}^n_{>0}$.

(3) now follows because $\alpha ^*\in e^V$ implies $D(\alpha \Vert \alpha ^*) + D(\alpha ^*\Vert \beta ) = D(\alpha \Vert \beta )$ for all $\beta \in e^V$, hence $\alpha ^*$ is the M-Projection of $\alpha $ to $e^V$.

Proof

(Proof of Theorem 6).

(1)
From the chain rule, $\dot{D}(x(t) \Vert y_A\circ \theta (t)) = ( \nabla _x D\cdot \dot{x} + \nabla _\theta D\cdot \dot{\theta })|_{(x(t),\theta (t))}$. From Theorem 4, the first term is nonpositive with equality iff x(t) is the E-Projection of $y(\theta (t))$ onto $H_{x_0}$. From Theorem 5, the second term is nonpositive with equality iff $y(\theta (t))$ is the M-Projection of x(t) onto $y_A(\mathbb {R}^m)$. Hence $dD(x(t) \Vert y_A\circ \theta (t))/dt \le 0$ with equality iff both x(t) is the E-Projection of $y_A\circ \theta (t)$ to $H_{x_0}$ and $y_A\circ \theta (t)$ is the M-Projection of x(t) to $y_A(\mathbb {R}^m)$.
(2)
Since $D(x(t) \Vert y_A\circ \theta (t))$ has a lower bound and $dD(x(t) \Vert y_A\circ \theta (t))/dt \le 0$, eventually $dD(x(t) \Vert y_A\circ \theta (t))/dt = 0$ at which point, by the above argument, both the E-Projection and M-Projection subnetworks are stationary, so that $\dot{x}=0$ and $\dot{\theta }=0$. Hence the limit $(\hat{x},\hat{\theta })=\lim _{t\rightarrow \infty }(x(t), \theta (t))$ exists.
(3)
follows since $\nabla _\theta D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }} = \dot{\theta }(t)/\theta (t)|_{\hat{x},\hat{\theta }}=0$ when $\hat{\theta }\in \mathbb {R}^\varTheta _{>0}$.
(4)
$\nabla _x D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }} = \log \left( \frac{\hat{x}}{y_A(\hat{\theta })}\right) $. By (1), the point $\hat{x}$ is the E-Projection of $y_A(\hat{\theta })$ to $H_{x_0}$. Hence by Theorem 2, the point $\hat{x}$ is the Birch point of $x_0$ relative to the affine space $\log y_A(\mathbb {R}^m)$, so that $(x - \hat{x})\log \left( \frac{\hat{x}}{y_A(\hat{\theta })}\right) =0$ for all $x\in H_{x_0}$. Hence the gradient $\nabla _x D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }}$ is perpendicular to $H_{R_\mathcal {B}}$.

Example 8

Consider $A=\left( \begin{array}{ccc}2&{}1&{}0\\ 0&{}1&{}2\end{array}\right) $ and $\mathcal {S}=\left( \begin{array}{rrr}1&{}0&{}1\\ 1&{}1&{}1\end{array}\right) $. The vector $\left( \begin{array}{r}1 \\ 0 \\ -1 \end{array}\right) $ spans $\ker \mathcal {S}$. The corresponding reaction network is

$$\begin{aligned} \begin{array}{lllllll} X_1\rightarrow X_1 + 2\theta _1 &{}&{}2\theta _1\rightarrow 0 &{}&{}X_2\rightarrow X_2+\theta _1+\theta _2 &{}&{}\quad \theta _1+\theta _2\rightarrow 0\\ X_3\rightarrow X_3+2\theta _2 &{}&{} 2\theta _2\rightarrow 0 &{}&{}X_1 + 2\theta _2\rightarrow X_3 + 2\theta _2 &{}&{}X_3 + 2\theta _1\rightarrow X_1 + 2\theta _1 \end{array} \end{aligned}$$

Here the concentration of $X_2$ remains invariant with time. Let c be the initial concentration of $X_2$. If $c < 1/3$ then the system admits two stable equilibria and one unstable equilibrium. The points $(y_1,c,y_2,\sqrt{y_1},\sqrt{y_2})$ and $(y_2,c,y_1,\sqrt{y_2},\sqrt{y_1})$ are the stable equilibria where $y_1=\frac{1-c}{2}+\frac{\sqrt{(1-3c)(1+c)}}{2}$ and $y_2=\frac{1-c}{2}-\frac{\sqrt{(1-3c)(1+c)}}{2}$, and $\left( \frac{1-c}{2},c,\frac{1-c}{2},\sqrt{\frac{1}{3}},\sqrt{\frac{1}{3}}\right) $ is the unstable equilibrium. On the other hand, if $c\ge 1/3$ then there is only one equilibrium point at $\left( \frac{1-c}{2},c,\frac{1-c}{2},\sqrt{\frac{1}{3}},\sqrt{\frac{1}{3}}\right) $, and this point is stable.

Example 9

Consider $A=\left( \begin{array}{ccc}2&{}1&{}0\\ 0&{}1&{}2\\ \end{array}\right) $ and $\mathcal {S}=\left( \begin{array}{rrr}1&{}-1&{}0\\ 1&{}1&{}1\end{array}\right) $. The vector $\left( \begin{array}{r}1 \\ 1 \\ -2 \end{array}\right) $ spans $\ker \mathcal {S}$. The corresponding reaction network is

$$\begin{aligned} \begin{array}{llllllll} &{}X_1\rightarrow X_1 + 2\theta _1, &{}&{}2\theta _1\rightarrow 0, &{}&{}X_2\rightarrow X_2+\theta _1+\theta _2, &{}&{}X_1 + X_2+3\theta _2\rightarrow 2X_3 +3\theta _2\\ &{}X_3\rightarrow X_3+2\theta _2, &{}&{} 2\theta _2\rightarrow 0, &{}&{}\theta _1+\theta _2\rightarrow 0,&{}&{} 2X_3 + 3\theta _1\rightarrow X_1 + X_2+3\theta _1 \end{array} \end{aligned}$$

Here the set $\{X_1,X_2,\theta _1\}$ is a critical siphon. If we start at the initial concentrations $x_1=0.05,x_2=0.05,x_3=0.9,\theta _1=0.1,\theta _2=1.0$, then the system converges to $x_1=0,x_2=0,x_3=1,\theta _1=0,\theta _2=1$, hence this system is not persistent. This provides one explanation for this data: all the outcomes were of type $X_3$. If instead we start at $\theta _1=0.5,\theta _2=1.0$ and the same x concentrations, then the system converges to $x_1=x_2=x_3=1/3,\theta _1=\theta _2=1/\sqrt{c}$. This provides a different explanation for the same data: all three outcomes have occurred equally frequently.

Example 10

Boltzmann machines are a popular model in machine learning. Formally a Boltzmann machine is a graph $G=(V,E)$, each of whose nodes can be either 1 or 0. One associates to every configuration $s \in \{0,1\}^V$ of the Boltzmann machine an energy $E(s)=-\sum _i b_i s_i-\sum _{ij}w_{ij}s_is_j$. The probability of the Boltzmann machine being in configuration s is given by the exponential family $P(s;b,w)\propto \exp (-E(s))$. Boltzmann machines can be used to do inference conditioned on partial observations, and learning of the maximum likelihood values of the parameters $b_i,w_{ij}$ can be done by a stochastic gradient descent.

Our EM scheme can be used to implement the learning rule of arbitrary Boltzmann machines in chemistry. We illustrate the construction on the 3-node Boltzmann machine with $V=\{x_1,x_2,x_3\}$:

with biases $b_1,b_2,b_3$ and weights $w_{12}$ and $w_{23}$. We will work with parameters $\theta _i=\exp (b_i)$ and $\theta _{ij}=\exp (w_{ij})$. The design matrix $A=(a_{ij})_{5\times 8}$ is

and the corresponding exponential model $y_A:\mathbb {R}^5\rightarrow \mathbb {R}^8_{>0}$ sends $\theta = (\theta _1,\theta _2,\theta _3,\theta _{ 12},\theta _{23}) \longmapsto (\theta ^{a_{.1}},\theta ^{a_{.2}},\dots ,\theta ^{a_{.8}})$. If the node $x_2$ is hidden then the observation matrix $\mathcal S$ is

Our EM scheme yields the reaction network

Suppose we observe a marginal distribution (0.24, 0.04, 0.17, 0.55) on the visible nodes $x_1,x_3$. To solve for the maximum likelihood $\hat{\theta }$, we can initialize the system with $X_{000}=0.24,X_{001}=0.04,X_{010}=0,X_{011}=0,X_{100}=0.17,X_{101}=0.55,X_{110}=0,X_{111}=0$ and all $\theta $’s initialized to 1, the system reaches steady state at $\hat{\theta }_1=0.5176,\hat{\theta }_2=0.0018,\hat{\theta }_3=0.3881,\hat{\theta }_{12}=0.8246,\hat{\theta }_{23}=0.7969, \hat{X}_{000}=0.2391,\hat{X}_{001}=0.0389, \hat{X}_{010}=0.0009, \hat{X}_{011}=0.011, \hat{X}_{100}=0.1695, \hat{X}_{101}=0.5487, \hat{X}_{110}=0.0005, \hat{X}_{111}=0.0013$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Viswa Virinchi, M., Behera, A., Gopalkrishnan, M. (2018). A Reaction Network Scheme Which Implements the EM Algorithm. In: Doty, D., Dietz, H. (eds) DNA Computing and Molecular Programming. DNA 2018. Lecture Notes in Computer Science(), vol 11145. Springer, Cham. https://doi.org/10.1007/978-3-030-00030-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-00030-1_12
Published: 07 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00029-5
Online ISBN: 978-3-030-00030-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Reaction Network Scheme Which Implements the EM Algorithm

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A

Appendix A

Proof

Proof

Example 8

Example 9

Example 10

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation