Programming discrete distributions with chemical reaction networks

We explore the range of probabilistic behaviours that can be engineered with Chemical Reaction Networks (CRNs). We give methods to “program” CRNs so that their steady state is chosen from some desired target distribution that has finite support in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {N}}^m$$\end{document}Nm, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \ge 1$$\end{document}m≥1. Moreover, any distribution with countable infinite support can be approximated with arbitrarily small error under the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^1$$\end{document}L1 norm. We also give optimized schemes for special distributions, including the uniform distribution. Finally, we formulate a calculus to compute on distributions that is complete for finite support distributions, and can be compiled to a restricted class of CRNs that at steady state realize those distributions.


Introduction
Individual cells and viruses operate in a noisy environment and molecular interactions are inherently stochastic.How cells can tolerate and take advantage of noise (stochastic fluctuations) is a question of primary importance.It has been shown that noise has a functional role in cells [11]; indeed, some critical functions depend on the stochastic fluctuations of molecular populations and would be impossible in a deterministic setting.For instance, noise is fundamental for probabilistic differentiation of strategies in organisms, and is a key factor for evolution and adaptation [5].In Escherichia coli, randomly and independently of external inputs, a small sub-population of cells enters a non-growing state in which they can elude the action of antibiotics that can only kill actively growing bacterial cells.Thus, when a population of E. coli cells is treated with antibiotics, the persisted cells survive by virtue of their quiescence before resuming growth [14].This is an example in which molecular systems compute by producing a distribution.In other cases cells need to shape noise and compute on distributions instead of simply mean values.For example, in [16] the authors show, both mathematically and experimentally, that microRNA confers precision on the protein expression: it shapes the noise of genes in a way that decreases the intrinsic noise in protein expression, maintaining its expected value almost constant.Thus, although fundamentally important, the mechanisms used by cells to compute in a stochastic environment are not well understood.
Chemical Reaction Networks (CRNs) with mass action kinetics are a well studied formalism for modelling biochemical systems, more recently also used as a formal programming language [10].It has been shown that any CRN can be physically implemented by a corresponding DNA strand displacement circuit in a well-mixed solution [18].DNA-based circuits thus have the potential to operate inside cells and control their activity.Winfree and Qian have also shown that CRNs can be implemented on the surface of a DNA nanostructure [15], enabling localized computation and engineering biochemical systems where the molecular interactions occur between few components.When the number of interacting entities is small, the stochastic fluctuations intrinsic in molecular interactions play a predominant role in the time evolution of the system.As a consequence, "programming" a CRN to provide a particular probabilistic response for a subset of species, for example in response to environmental conditions, is important for engineering complex biochemical nano-devices and randomized algorithms.In this paper, we explore the capacity of CRNs to "program" discrete probability distributions.We aim to characterize the probabilistic behaviour that can be obtained, exploring both the capabilities of CRNs for producing distributions and for computing on distributions by composing them.
Contributions.We show that at steady state CRNs are able to compute any distribution with support in N m , with m ≥ 1.We propose an algorithm to systematically "program" a CRN so that its stochastic semantics at steady state approximates a given distribution with arbitrarily small error under the L 1 norm.The approximation is exact if the support of the distribution is finite.The resulting network has a number of reactions linear in the dimension of the support of the distribution and the output is produced monotonically allowing composition.Since distributions with large support can result in unwieldy networks, we also give optimised networks for special distributions, including a novel scheme for the uniform distribution.We formulate a calculus that is complete for finite support distributions, which can be compiled to a restricted class of CRNs that at steady state compute those distributions.The calculus allows for modelling of external influences on the species.Our results are of interest for a variety of scenarios in systems and synthetic biology.For example, they can be used to program a biased stochastic coin or a uniform distribution, thus enabling implementation of randomized algorithms and protocols in CRNs.Related work.It has been shown that CRNs with stochastic semantics are Turing complete, up to an arbitrarily small error [17].If we assume error-free computation, their computational power decreases: they can decide the class of the semi-linear predicates [4] and compute semi-linear functions [9].A first attempt to model distributions with CRNs can be found in [12], where the problem of producing a single distribution is studied.However, their circuits are approximated and cannot be composed to compute operations on distributions.

Chemical Reaction Networks
A chemical reaction network (CRN) (Λ, R) is a pair of finite sets, where Λ is the set of chemical species, |Λ| denotes its size, and R is a set of reactions.A reaction τ ∈ R is a triple τ = (r τ , p τ , k τ ), where r τ ∈ N |Λ| is the source complex, p τ ∈ N |Λ| is the product complex and k τ ∈ R >0 is the coefficient associated to the rate of the reaction, where we assume k τ = 1 if not specified; r τ and p τ represent the stoichiometry of reactants and products.Given a reaction τ 1 = ([1, 0, 1], [0, 2, 0], k 1 ) we often refer to it as τ 1 : λ 1 + λ 3 → k1 2λ 2 .The net change associated to τ is defined by υ τ = p τ − r τ .
We assume that the system is well stirred, that is, the probability of the next reaction occurring between two molecules is independent of the location of those molecules, at fixed volume V and temperature.Under these assumptions a configuration or state of the system x ∈ N |Λ| is given by the number of molecules of each species.A chemical reaction system (CRS) C = (Λ, R, x 0 ) is a tuple where (Λ, R) is a CRN and x 0 ∈ N |Λ| represents its initial condition.
Stochastic semantics.In this paper we consider CRNs with stochastic semantics.The propensity rate α τ of a reaction τ is a function of the current configuration of the system x such that α τ (x)dt is the probability that a reaction event occurs in the next infinitesimal interval dt.We assume mass action kinetics [2].That is, if τ : where r τ,i is the i−th component of vector r. 3 The time evolution of a CRS C = (Λ, R, x 0 ) can be modelled as a time-homogeneous Continuous Time Markov Chain (CTMC) (X C (t), t ∈ R ≥0 ), with state space S [2].When clear from the context we write , where x 0 is the initial configuration.P C (t)(x) represents the transient evolution of X, and can be calculated exactly by solving the Chemical Master Equation (CME) or by approximation of the CME [7].

Definition 1
The steady state distribution (or limit distribution) of X C is defined as π C = lim t→∞ P C (t).
Again, when clear from the context, instead of π C we simply write π. π calculates the percentage of time, in the long-run, that X spends in each state x ∈ S. If S is finite, then the above limit distribution always exists and is unique [13].In this paper we focus on discrete distributions, and will sometimes conflate the term distribution with probability mass function, defined next.Definition 2 Suppose that M : S → R m with m > 0 is a discrete random variable defined on a countable sample space S. Then the probability mass function For a pmf π : N m → [0, 1] we call J = {y ∈ N m |π(y) = 0} the support of π.A pmf is always associated to a discrete random variable whose distribution is described by the pmf.However, sometimes, when we refer to a pmf, we imply the associated random variable.Given two pmfs f 1 and f 2 with values in N m , m > 0, we define the L 1 norm (or distance) between them as It is worth stressing that, given the CTMC X, for each t ∈ R ≥0 , X(t) is a random variable defined on a countable state space.As a consequence, its distribution is given by a pmf.Likewise, the limit distribution of a CTMC, if it exists, is a pmf.
as the probability that for t → ∞, in X C , there are k molecules of λ. π λ is a pmf representing the steady state distribution of species λ.

On approximating discrete distributions with CRNs
We now show that, ffor a pmf with support in N, we can always build a CRS such that, at steady state (i.e. for t → ∞) the random variable representing the molecular population of a given species in the CRN approximates that distribution with arbitrarily small error under the L 1 norm.The result is then generalised to distributions with domain in N m , with m ≥ 1.The approximation is exact in case of finite support.

Programming pmfs Definition 4 Given
and 2|J|+2 species.For any z i ∈ J we have two species λ i , λ i,i ∈ Λ such that x 0 (λ i ) = z i and x 0 (λ i,i ) = 0.Then, we consider a species λ z ∈ λ such that x 0 (λ z ) = 1, and the species λ out ∈ Λ, which represents the output of the network and such that x 0 (λ out ) = 0.For every z i ∈ J, R has the following two reactions: τ i,1 : λ z → f (zi) λ i,i and τ i,2 : Example 1.Consider the probability mass function f : N → [0, 1] defined as f (y) = { 1 6 if y = 2; 1 3 if y = 5; 1 2 if y = 10; 0 otherwise}.Let Λ = {λ 1 , λ 2 , λ 3 , λ z , λ 1,1 , λ 2,2 , λ 3,3 , λ out }, then we build the CRS C = (Λ, R, x 0 ) following Definition 4, where R is given by the following set of reactions: Proof.Let J = (z 1 , .., z |J| ) be the support of f , and |J| its size.Suppose |J| is finite, then the set of reachable states from x 0 is finite by construction and the limit distribution of X C f , the induced CTMC, exists.By construction, in the initial state x 0 only reactions of type τ i,1 can fire, and the probability that a specific τ i,1 fires first is exactly: Observe that the firing of the first reaction uniquely defines the limit distribution of X C f , because λ z is consumed immediately and only reaction τ i,2 can fire, with no race condition, until λ i are consumed.This implies that at steady state λ out will be equal to x 0 (λ i ), and this happens with probability f (x 0 (λ i )).Since Then, we can state the following corollary of Theorem 1.
Corollary 1.Given a pmf f : N → [0, 1] with countable support J, we can always find a finite CRS C f such that π C f λout = f with arbitrarily small error under the L 1 norm.
Then, we can always consider an arbitrarily large but finite number of points in the support, such that the probability mass lost is arbitrarily small, and applying Definition 4 on this finite subset of the support we have the result.
In order to prove the result consider the function f ′ with support . This is an absolute convergent series by definition of pmf.Then, we have that lim i→∞ f (i) = 0 and, for any ǫ > 0, we can choose some κ ε ∈ N, such that: This implies that for k > κ ε given The following remark shows that the need for precisely tuning the value of reaction rates in Theorem 1 can be dropped by introducing some auxiliary species.
Remark 1.In practice, tuning the rates of a reaction can be difficult or impossible.However, it is possible to modify the CRS derived using Definition 4 in such a way the probability value is not encoded in the rates, and we just require that all reactions have the same rates.We can do that by using some auxiliary species Λ c = {λ c1 , λ c2 , ..., λ c |Λc | }.Then, the reactions τ i, number, assuming all the f (z j ) are rationals.
Remark 2. In biological circuits the probability distribution of a species may depend on some external conditions.For example, the lambda Bacteriofage decides to lyse or not to lyse with a probabilistic distribution based also on environmental conditions [5] where Com is an external input.To program this logic we can use the following reactions: τ 1,1 : λ z + λ c1 → k1 λ O1 and τ 1,2 : λ z + λ c2 → k1 λ O2 , where λ O1 and λ O2 model the two logic states, initialized at 0. The initial condition x 0 is such that x 0 (λ z ) = 1, x 0 (λ c1 ) = 50 and x 0 (λ c2 ) = 5000.Then, we add the following reaction Com → k2 λ c1 .It is easy to show that if k 2 >>> k 1 then we have the desired probabilistic behaviour for any initial value of Com ∈ N.This may be of interest also for practical scenarios in synthetic biology, where for instance the behaviour of synthetic bacteria needs to be externally controlled [3]; and, if each bacteria is endowed with a similar logic, then, by tuning Com, at the population level, it is possible to control the fraction of bacteria that perform this task.
In the next theorem we generalize to the multi-dimensional case.
, then there exists a CRS C = (Λ, R, x 0 ) such that the joint limit distribution of (λ out1 , λ out2 , ..., λ outm ) ∈ Λ approximates f with arbitrarily small error under the L 1 distance.The approximation is exact if the support of f is finite.
To prove this theorem we can derive a CRS similar to that in the uni-dimensional case.The firing of the first reaction can be used to probabilistically determine the value at steady state of the m output species, using some auxiliary species.

Special distributions
For a given pmf the number of reactions of the CRS derived from Definition 4 is linear in the dimension of its support.As a consequence, if the support is large then the CRSs derived using Theorems 1 and 2 can be unwieldy.In the following we show three optimised CRSs to calculate the Poisson, binomial and uniform distributions.These CRNs are compact and applicable in many practical scenarios.However, using Definition 4 the output is always produced monotonically.In the circuits below this does not happen, but, on the other hand, the gain in compactness is substantial.The first two circuits have been derived from the literature, while the CRN for the uniform distribution is new.
Poisson distribution.The main result of [1] guarantees that all the CRNs that respect some conditions (weakly reversible, deficiency zero and irreducible state space, see [1]) have a distribution given by the product of Poisson distributions.As a particular case, we consider the following CRS composed of only one species λ and the following two reactions τ 1 : ∅ → k1 λ; τ 2 : λ → k2 ∅.Then, at steady state, λ has a Poisson distribution with expected value k1 k2 .Binomial distribution.We consider the network introduced in [1].The CRS is composed of two species, λ 1 and λ 2 , with initial condition x 0 such that x 0 (λ 1 ) + x 0 (λ 2 ) = K and the following set of reactions: τ 1 : λ 1 → k1 λ 2 ; τ 2 : λ 2 → k2 λ 1 .As shown in [1], λ 1 and λ 2 at steady state have a binomial distribution such that: Uniform distribution.The following CRS computes the uniform distribution over the sum of the initial number of molecules in the system, independently of the initial value of each species.It has species λ 1 and λ 2 and reactions: For k > 0, τ 1 and τ 2 implement the binomial distribution.These are combined with τ 3 and τ 4 , which implement a Direct Competition system [6], which has a bimodal limit distribution in 0 and in K, where x 0 (λ 1 ) + x 0 (λ 2 ) = K, with x 0 initial condition.This network, surprisingly, according to the next theorem, at steady state produces a distribution which varies uniformly between 0 and K.
Then, the CRS described above has the following steady state distribution for λ 1 and λ 2 : Proof.We consider a general initial condition x 0 such that x 0 (λ 1 ) = K − M and x 0 (λ 2 ) = M for 0 ≤ M ≤ K and K, M ∈ N. Because any reaction has exactly 2 reagents and 2 products, we have the invariant that for any configuration x reachable from x 0 it holds that x(λ 1 ) + x(λ 2 ) = K. Figure 1 plots the CTMC semantics of the system.For any fixed K the set of reachable states from any initial condition in the induced CTMC is finite (exactly K states are reachable from any initial condition) and irreducible.Therefore, the steady state solution exists, is unique and independent of the initial conditions.To find this limit distribution we can calculate Q, the infinitesimal generator of the CTMC, and then solve the linear equations system πQ = 0, with the constraint that , where π i is the ith component of the vector π, as shown in [13].Because the CTMC we are considering is irreducible, this is equivalent to solve the balance equations with the same constraint.The resulting π is the steady state distribution of the system.
We consider 3 cases, where (K − j, j) for j ∈ [0, K] represents the state of the system in terms of molecules of λ 1 and λ 2 .
-Case j = 0.For the state (K, 0), whose limit distribution is defined as π(K, 0), we have the following balance equation: Observing Figure 1 we see that the states and the rates follow a precise pattern: every state is directly connected with only two states and for any transition the rates depend on two reactions, therefore we can consider the balance equations for a general state (K − j, j) (for the sake of a lighter notation instead of π(K − j, j) we write π j ): It is easy to verify that if π j−1 = π j = π j+1 then the equation is verified.-Case j = K.The case for the state (0, K) is similar to the case (K, 0).
We have shown that each reachable state has equal probability at steady state for any possible initial condition.Therefore, because K i=0 π i = 1 and π λi (y) = xj∈S|xj(λi)=y π j for y ≥ 0, we have that for both λ 1 and λ 2

Calculus of limit distributions of CRNs
In the previous section we have shown that CRNs are able to program any pmf on N. We now define a calculus to compose and compute on pmfs.We show it is complete with respect to finite support pmfs on N.Then, we define a translation of this calculus into a restricted class of CRNs.We prove the soundness of such a translation, which thus yields an abstract calculus of limit distributions of CRNs.For simplicity, in what follows we consider only pmfs with support in N, but the results can be generalised to the multi-dimensional case in a straightforward way.
Definition 5 (Syntax).The syntax of formulae of our calculus is given by A formula P denotes a pmf that can be obtained as a sum, minimum, multiplication by a rational, or convex combination of pmfs one and zero.Given a formula P , variables V = {c 1 , ..., c n }, called environmental inputs, model the influence of external factors on the probability distributions of the system.V (P ) represents the variables in P .An environment E : V → Q [0,1] is a partial function which maps each input c i to its valuation normalized to [0, 1].Given a formula P and an environment E, where V (P ) ⊆ dom(E), with dom(E) domain of E, we define its semantics, [[P ]] E , as a pmf (the empty environment is denoted as ∅).D expresses a summation of valuations of inputs c i weighted by rational probabilities p, which evaluates to a rational [[D]] E for a given environment.We require that, for any D, the sum of The semantics is defined inductively as follows, where the operations on pmfs are defined in Section 4.1.
To illustrate the calculus, consider the Bernoulli distribution with parameter p ∈ Q [0,1] .We have bern p = (one) p : zero, where [[bern p ]] ∅ (y) = {p if y = 1; 1 − p if y = 0; 0 otherwise}.The binomial distribution can be obtained as a sum of n independent Bernoulli distributions of the same parameter.Given a random variable with a binomial distribution with parameters (n, p), if n is sufficiently large and p sufficiently small then this approximates a Poisson distribution with parameter n • p.

Operations on distributions
In this section, we define a set of operations on pmfs needed to define the semantics of the calculus.We conclude the section by showing that these operations are sufficient to represent pmfs with finite support in N.
-The multiplication of π 1 by the constant k 1 is defined as -The convex combination of π 1 and π 2 , for y ∈ N, is defined The convex combination operator is the only one that is not closed with respect to pmfs whose support is a single point.Lemma 1 shows that this operator is not associative with respect to minimum and sum of pmfs.
Lemma 1.Given probability mass functions and k ∈ Q ≥0 , then the following equations hold: Example 2. Consider the following formula with set of environmental variables V = {c} and an enviroment E such that V (P 1 ) ⊆ dom(E).Then, according to Definition 7 we have that Having formally defined all the operations on pmfs, we can finally state the following proposition guaranteeing that the semantics of any formula of the calculus is a pmf.
Proposition 1.Given P , a formula of the calculus defined in Definition 5, and an environment E such that V (P ) ⊆ dom(E), then The following theorem shows that our calculus is complete with respect to finite support distributions.
Theorem 4. For any pmf f : N → [0, 1] with finite support there exists a formula P such that : ... : Proof of Theorem 4 relies only on a subset of the operators, but the other operators are useful for composing previously defined pmfs.

CRN implementation
We now show how the operators of the calculus can be realized by operators on CRSs.The resulting CRSs produce the required distributions at steady state, that is, in terms of the steady state distribution of the induced CTMC.Thus, we need to consider a restricted class of CRNs that always stabilize and that can be incrementally composed.The key idea is that each such CRN has output species that cannot act as a reactant in any reaction, and hence the counts of those species increase monotonically. 4 This implies that the optimized CRSs shown in Section 3.2 cannot be used compositionally.

Non-reacting output CRSs (NRO-CRSs)
Since in the calculus presented in Definition 5 we consider only finite support pmfs, in this section we are limited to finite state CTMCs.This is important because some results valid for finite state CTMCs are not valid in infinite state spaces.Moreover, any pmf with infinite support on natural numbers can always be approximated under the L 1 norm (see Corollary 1).Given a CRS C = (Λ, R, x 0 ), we call the non-reacting species of C the subset of species Λ r ⊆ Λ such that given λ r ∈ Λ r there does not exist τ ∈ R such that r λr τ > 0, where r λr τ is the component of the source complex of the reaction τ relative to λ r , that is, λ r is not a reactant in any reaction.Given C we also define a subset of species, Λ o ⊆ Λ, as the output species of C. Output species are those whose limit distribution is of interest.In general, they may or may not be non-reacting species; they depend on the observer and on what he/she is interested in observing.NRO-CRNs are CRSs in which the output species are produced monotonically and cannot act as a reactant in any reaction.A consequence of Theorem 1 is the following lemma, which shows that this class of CRNs can approximate any pmf with support on natural numbers, up to an arbitrarily small error.
Lemma 2. For any probability mass function f : N m → [0, 1] there exists a NRO-CRS such that the joint limit distribution of its output species approximates f with arbitrarily small error under the L 1 norm.The approximation is exact if the support of f is finite.
In Table 1, we define a set of operators on NRO-CRSs.Let ∅. Let λ o1 ∈ Λ o1 and λ o2 ∈ Λ o2 , then we define the set of reactions which implements the operators of Sum, Minimum, Multiplication by a constant k 1 ∈ N and Division by a constant k 2 ∈ N ≥0 over the steady state distribution of λ o1 and λ o2 .The output species of each composed NRO-CRS is λ out , and we assume We emphasize that proving that CRS operators of Table 1 implement the operations in Definition 7 is not trivial.In fact, we need to compose stochastic processes and show that the resulting process has the required properties.Fundamental to that end is a convenient representation of X in terms of a summation of time-inhomogeneous Poisson processes, one for each reaction [2].In what follows we present in slightly extended form the operators for convex combination, with or without external inputs (respectively Con(•) and ConE(•)).Formal definitions and proofs of correctness of all the circuits are presented in [8].Considering C 1 and C 2 , as previously, then we need to derive a CRS operator Con(C 1 , λ o1 , C 2 , λ o2 , p, λ out ) such that π λout = (π C1 λo 1 ).That is, at steady stade, λ out equals π C1 λo 1 with probability p and π C2 λo 2 with probability 1 − p.This can be done by using Theorem 2 to generate a bi-dimensional synthetic coin with output species λ r1 , λ r2 such that their joint limit distribution is π λr 1 ,λr 2 (y 1 , y 2 ) = {p if y 1 = 1 and y 2 = 0 ; 1 − p if y 1 = 0 and y 2 = 1; 0 otherwise}.That is, λ r1 and λ r2 are mutually exclusive at steady state.Using these species as catalysts in τ 3 : λ o1 +λ r1 → λ r1 +λ out and τ 4 : λ o2 + λ r2 → λ r2 + λ out we have exactly the desired result at steady state.
Example 3. Consider the following NRO-CRSs C 1 = ({λ o1 }, {λ o1 }, {}, x 01 ) and C 2 = ({λ o2 }, {λ o2 }, {}, x 02 ), with initial condition x 01 (λ o1 ) = 10 and x 02 (λ o2 ) = 20.Then, the operator ) and it is given by the following reactions: Let C 1 , C 2 be as above and f = p 0 + p 1 • c 1 + ... + p n • c n with p 1 , ..., p n ∈ Q [0,1] , V = {c 1 , ..., c n } a set of environmental variables, and E, an environment such that V ⊆ dom(E).Then, computing a CRS operator ) is a matter of extending the previous circuit.First of all, we can derive the CRS to compute f (E(V )) and 1 − f (E(V )) and memorize them in some species.This can be done as f (E(V )) is semi-linear [9].Then, as f (E(V )) ≤ 1 by assumption, we can use these species as catalysts to determine the output value of λ out , as in the previous case.As shown in [8], this circuit, in the case of external inputs, introduces an arbitrarily small, but non-zero, error, due to the fact that there is no way to know when the computation of f (E(V )) terminates.
) and it is given by the following reactions: and store these values in λ Cat1 and λ T ot .These are used in reactions τ 3 and τ 4 to determine the probability that the steady state value of λ out is going to be determined by reaction τ 5 or τ 6 .

Compiling into the class of NRO-CRSs
Given a formula P as defined in Definition 5, then [[P ]] E associates to P and an environment E a pmf.We now define a translation of P , T (P ), into the class of NRO-CRSs that guarantees that the unique output species of T (P ), at steady state, approximates [[P ]] E with arbitrarily small error for any environment E such that V (P ) ⊆ dom(E).In order to define such a translation we need the following renaming operator.
, where R{λ 1 ← λ t } substitutes any occurrence of λ t with an occurrence of λ 1 for any τ ∈ R and This operator produces a new CRS where any occurrence of a species is substituted with an occurrence of another species previously not present.
Then, we have that : and all other species initialized with 0 molecules.Proposition 2. For any formula P we have that T (P ) is a NRO-CRS.
The proof follows by structural induction as shown in [8].Given a formula P and an environment E such that V (P ) ⊆ dom(E), the following theorem guarantees the soundness of T (P ) with respect to [[P ]] E .In order to prove the soundness of our translation we consider the measure of the multiplicative error between two pmfs f 1 and f 2 with values in N m , m > 0 as ). Theorem 5. (Soundness) Given a formula P and λ out , unique output species of T (P ), then, for an environment E such that V (P ) ⊆ dom(E), it holds that π T (P ) λout = [[P ]] E with arbitrarily small error under multiplicative error measure.The proof follows by structural induction.Remark 3. A formula P is finite by definition, so Theorem 5 is valid because the only production rule which can introduce an error is (P 1 ) D : (P 2 ) in the case D = p 0 , and we can always find reaction rates to make the total probability of error arbitrarily small.Note that, by using the results of [17], it would also be possible to show that the total error can be kept arbitrarily small, even if a formula is composed from an unbounded number of production rules.This requires small modifications to the ConE operator following ideas in [17].
Note that compositional translation, as defined in Definition 10, generally produces more compact CRNs respect to the direct translation in Theorem 1, and in both cases the output is non-reacting, so the resulting CRN can be used for composition.For a distribution with support J direct translation yields a CRN with 2|J| reactions, whereas, for instance, the support of the sum pmf has the cardinality of the Cartesian product of the supports of the input pmfs.

Discussion
Our goal was to explore the capacity of CRNs to compute with distributions.This is an important goal because, when molecular interactions are in low number, as is common in various experimental scenarios [15], deterministic methods are not accurate, and stochasticity is essential for cellular circuits.Moreover, there is a large body of literature in biology where stochasticity has been shown to be essential and not only a nuisance [11].Our work is a step forward towards better understanding of molecular computation.In this paper we focused on error-free computation for distributions.It would be interesting to understand and characterize what would happen when relaxing this constraint.That is, if we admit a probabilistically (arbitrarily) small error, does the ability of CRNs to compute on distributions increase?Can we relax the constraint that output species need to be produced monotonically?Can we produce more compact networks?Another topic we would like to address is if it is possible to implement the CRNs without leaders (species being present with initial number of molecules equal to 1).This is a crucial aspect in our theorems and having the same results without these constraints would make the implementation easier.However, it is worth noting that, in a practical scenario, such species could be thought of as a single gene or as localized structures [15].

A Examples
Example 6.Consider the following probability mass function , if y 1 = 1 and y 2 = 5 0, otherwise we present the CRS C = (Λ, R, x 0 ) that according to its stochastic semantics, for λ out1 , λ out2 ∈ Λ yields the steady-state distribution π λout 1 ,λout 2 , joint limit distribution of λ out1 , λ out2 , exactly equal to f .Let Λ = {λ z , λ a , λ b , λ c , λ 1,1 , λ 1,2 λ 2,1 , λ 2,2 , λ 3,1 , λ 3,2 λ out1 , λ out2 } and R given by the following set of reactions: The initial condition x 0 is such that: all other species mapped to zero.The set of reachable states from x 0 is finite so the limit distribution exists.The firing of the first reaction uniquely determines the steady state solution.x 0 (λ i,1 ) and x 0 (λ i,2 ) for i ∈ [1,3] are exactly the value of λ out1 and λ out2 at steady state if the first reaction to fire is τ i , this happens with probability f (x 0 (λ i,1 ), x 0 (λ i,2 )).Therefore, we have that, at steady state, the joint distribution of λ out1 and λ out2 equals f .Example 7. Consider the following pmf π 1 : N → [0, 1] Then the sum of π 1 and π 2 is: The follow example shows the use of the sum CRN operator Example 10.We consider the pmfs π 1 and π 2 of example 7. Using the results of theorem 1 we build the CRSs C 1 and C 2 such that λ out1 and λ out2 , unique output species of C 1 and C 2 respectively, admit as steady state state distribution exactly π 1 and π 2 .C 1 = ({λ z , λ 1 , λ 1,1 , λ o1 }, {λ o1 }, R, x 0 ) has the following reactions where ∅ is the empty set and x 0 is such that: The CRS C 2 has the following reactions with initial condition x 0 such that: Then, applying the sum operator circuit we add the following reactions λ o2 , λ out ) has unique output species λ out , whose limit distribution, π λout , is equal to π 1 + π 2 described in example 7.In figure 2 we plot the solution of the CME for λ out .
Example 11.We consider the CRSs C 1 and C 2 of example 10, and we apply the operator M in(C 1 , λ o1 , C 2 , λ o2 , λ out ).Now, λ out is the only output species.In figure 3 we plot the solution of the CME for λ out and it is possible to note that its limit distribution reaches the desired value.
Example 12.We consider the CRS C 2 of example 10, then, applying the circuit for the multiplication by k = 2, we have the following reaction In figure 4 we plot the solution of the CME for λ out .Proof.The proof is by structural induction on the structure of P .We need to show that for P = one and P = zero the property holds and that any rule of the language produces a pmf.
-P=one and P=zero.
We need to show that, assuming -P = min(P 1 , P 2 ).
Again, we need to show that assuming [ ) is a pmf as well.We can show that in the exactly same way as the summation case.
Again, we need to show that assuming In order to do that we can note that [[k Then, it is enough showing that given a pmf π both k 1 • π and π k2 produce a pmf.This can be done in the exact same way as the previous cases. .This can be shown as in the precedent cases.
Lemma 2. For any probability mass function f : N m → [0, 1] there exists a NRO-CRS such that the joint limit distribution of its output species approximates f with arbitrarily small error under the L 1 norm.The approximation is exact if the support of f is finite.
Proof.This lemma is a consequence of Theorems 1 and 2. In fact, by construction, all CRSs used in those theorems are non-reacting output.
The following lemma characterizes the behaviour of non-reacting output CRSs and will be useful in what follows.It states that the propensity rates of all reactions of a NRO-CRS are not dependent on the output species.

B.1 Alternative representation CTMC
CRNs are well represented by Continuous Time Markov Chains (CTMCs).Here, we consider the derivation of the stochastic model presented in [2].Given a CRS C = (Λ, R, x 0 ), the number of times that a reaction τ ∈ R fires between [0, t] can be modeled as an independent unit Poisson process Y τ of intensity t 0 α τ (s)ds.Therefore, the stochastic semantics of C = (Λ, R, x 0 ) satisfies the following stochastic equations ] describes the stochastic evolution of molecular populations of each species at time t is a homogeneous CTMC with a countable state space S ⊆ N |Λ| .In what follows we give we formalize the CRS operators introduced in Section 5 and show their correctness.
Proof.Given a reaction τ the number of occurrences of τ until time t can be modelled as an independent unit Poisson process, Y τ ( t 0 α τ (X Cc (s)) ds), as shown in [2].Therefore, considering the counting processes J Cc λo 1 and J Cc λo 2 which give the number of molecules of λ o1 and λ o2 produced until time t in C c , then where p represent CTMCs given by the sum of |R| + 2 independent Poisson processes, one for each reaction.Indeed, the firing of the reaction τ produces p λo τ molecules by definition.However, p are such that for all λ ∈ (Λ 1 − {λ o1 }), x 1 (λ) = x 2 (λ) then α τ (x 1 ) = α τ (x 2 ) for all τ ∈ R 1 because of Lemma 3 (recall that Moreover, r λ τs 1 = p λ τs 1 = r λ τs 2 = p λ τs 2 = 0 for any λ ∈ Λ − {λ out , λ o1 , λ o2 }, that is, τ s1 and τ s2 do not produce or consume any species in Λ − {λ out , λ o1 , λ o2 }.Therefore, because x 0 (λ) = x 01 (λ) for all λ ∈ Λ 1 − {λ o1 }, we have t 0 α τ (X Cc (s)) ds = t 0 α τ (X C1 (s)) ds for all τ ∈ R 1 .In exactly same way it possible to show that the same relation holds for λ o2 with respect to X C2 , and as a consequence it is also true that t 0 α τ (X Cc (s)) ds = t 0 α τ (X C2 (s)) ds for all τ ∈ R 2 .As a result: Considering that λ o1 is an output species in C 1 and λ o2 is an output species in C 2 , that is, NRO-CRSs, then for any τ ∈ R 1 we have that υ τ .As a consequence: According to the fact that in the composed NRO-CRS λ out is produced only by τ s1 and τ s2 is not consumed in any reaction, its initial molecular count is 0 and p λout τs 1 = p λout τs 2 = 1, then it is possible to write: In the same way we can define the stochastic model for the number of molecules of λ o1 or λ o2 present in C c during time given by the number of molecules produced minus the number of molecules consumed, considering that λ o1 and λ o2 are consumed only by τ s1 and τ s2 and they are not reactant in any other reaction: (0) by assumption.The set of reachable states from x 0 in X Cc is finite because the set of reachable states from x 01 in X C1 and from x 02 in X C2 are finite by assumption and τ c can fire only a finite number of times.This implies that X Cc (t) for t → ∞ will reach a BSCC of the underlying graph of the state space, with probability 1 in finite time, because of a well known result of CTMC theory [13].In a BSCC, any pair of configurations x 1 and x 2 are such that x 1 → * x 2 and x 2 → * x 1 .Therefore, any configuration x in any BSCC reachable by X Cc from x 0 is such that x(λ o1 ) = x 0 (λ o2 ) = 0, because in a configuration x i where x i (λ o1 ) > 0 or x i (λ o2 ) > 0 it is always possible to reach a configuration x j where x j (λ o1 ) = x i (λ o1 ) − 1 or x j (λ o2 ) = x i (λ o2 ) − 1 and x j (λ out ) = x i (λ out ) + 1, but then there is no way to reach x i from x j because λ out is not reactant in any reaction in This shows that in any realization of X Cc the number of molecules of λ out is exactly the sum of the molecules of λ o1 and λ o2 that we would have had if the rates of τ s1 and τ s2 were 0.
The proof that the steady state distribution of λ out is as desired follows the Sum case.Using the same arguments it is possible to show that X Cc λo 1 Again, the number of states reachable from the initial condition in C c is finite, therefore for t → ∞ X Cc (t) will reach a BSCC with probability 1 in finite time.In any state of a reachable BSCC from x ′ 0 there must be 0 probability that τ min fires, indeed, λ out is produced only by τ min and it is not consumed in any reaction.As a consequence: This implies the statement, indeed, λ out is produced by the only reaction τ min and τ min is the only reaction which consumes λ o1 and λ o2 .Therefore, for any realization of X Cc , τ min will fire exactly the number of times that the minimum between the number of molecules of λ o1 and λ o2 produced.At this point, it is enough to note that υ λout τC c = 1.Proof.The proof follows the proofs for sum and minimum circuits.Then for λ o1 ∈ Λ o1 , λ o2 ∈ Λ o2 we define Con(C 1 , λ o1 , C 2 , λ o2 , p, λ out ) = (Λ ′ , {λ out }, R ′ , x 0 ) with where R C is given by the following reactions: τ 1 : λ z → p λ r1 ; τ 2 : λ z → 1−p λ r2 ; τ 3 : λ o1 + λ r1 → λ r1 + λ out ; with probability 1 − f (E(V )).If τ 1 is the first reaction to fire then at steady state λ out will be equal to the limit value of λ o1 , and if τ 2 fires first then it goes exactly to the limit value of λ o2 .
Proposition 2. Given a formula P then T (P ) is a NRO-CRSs.
Proof.The proof is by structural induction.The base cases are T (zero) and T (one), which are NRO-CRSs by definition.Assuming T (P 1 ) and T (P 2 ) are NRO-CRNs then application of operators of sum, M ul, Div, M in, Con and ConE on these CRSs produces a NRO-CRNs by definition of the operators.
Theorem 5. (Soundness) Given a formula P and λ out , unique output species of T (P ), then, for an environment E such that V (P ) ⊆ dom(E), it holds that π T (P ) λout = [[P ]] E with arbitrarily small error under multiplicative error measure.
Proof.The proof is by structural induction.As a basic case, we need to prove that, given one and zero, then the output species of T (one) and T (zero), at steady state, have a distribution for their output species equal to [[one]] E and [[zero]] E .Then, we need to show that, assuming the theorem holds for the preceding steps, any application of a rule of the grammar produces a CRN such that its output species at steady state has a distribution which approximates the pmf given by the semantics of the formula with arbitrarily small error under multiplicative error measure.That is, the two distributions have the same support and the distance between the probability associated to a general point of the support is arbitrarily small.
one.T (one) is a CRS with no reactions and a single species λ out which is also an output species.x 0 (λ out ) = 1 as a consequence π T (one) λout (1) = 1.-zero.
T (zero) is a CRN with no reactions and a single species λ out which is also an output species.x 0 (λ out ) = 0 as a consequence π T (zero) λout (0) = 1.. -P 1 + P 2 .

Fig. 1 :
Fig. 1: The figure shows the CTMC induced by the CRS implementing the uniform distribution for initial condition x 0 such that x 0 (λ 1 ) + x 0 (λ 2 ) = K.

Definition 8 A
non-reacting output CRS (NRO-CRS) is a tuple C = (Λ, Λ o , R, x 0 ), where Λ o ⊆ Λ are the output species of C such that Λ o ⊆ Λ r , where Λ r are the non-reacting species of C.

Fig. 2 :
Fig. 2: The figure plots the CME solution for the species λ out .Vertical axes represent the number of molecules of λ out , while horizontal axes represent time or marginal probability of molecules of species λ out

Fig. 3 :
Fig. 3: The figure plots the CME solution for species λ out .Vertical axes represent the number of molecules of λ out , while horizontal axes represent time or marginal probability of molecules of species λ out

Fig. 4 :
Fig. 4: The figure plots the CME solution for species λ out .Vertical axes represent the number of molecules of λ out , while horizontal axes represent time or marginal probability of molecules of λ out

λo 1 τ and p λo 2 τ
represent the number of molecules of λ o1 and λ o2 produced by the occurrence of reaction τ .J Cc λo 1 and J Cc λo 2