Introduction

The investigation of the generic properties of catalytic reaction networks, with specific regard to the emergence of the so-called autocatalytic cycles or sets (referred to as ACS hereafter), is one of the key issues in systems chemistry and, particularly, in the research concerning the origin of life.

The debate on the transition from non-living to living is still open and compelling [14] and several different theoretical frameworks have been proposed, such as the metabolic-first hypothesis [510], the protein - first scenario [1113], the compartmentalization [14], the compositional approach [15, 16] and the well-known gene - first hypothesis embodied in the RNA world theory [4, 1722].

Despite the differences, the underlining principle beyond these theories is the necessity for the existence of robust reaction pathways for the production of the molecular species involved in the transition. Some authors hypothesised the existence of linear chemical processes capable of produce substantial amount of the species under plausible prebiotic conditions [23] or at energy-rich site such as hydrothermal vents [24]. Alternatively, great emphasis has been put on chemical cycles embodied by autocatalytic network [25] as a means to produce molecular species and achieve self-sustenance.

Several models have been developed in the course of time to describe this kind of systems, including the works by Dyson [26], Eigen and Schuster [2730], Kauffman [7], Jain and Khrishna [31], Lancet [15, 16] and Kaneko [32], while recently the formalism introduced by Steel et al.[3335] has contributed in setting a standard theoretical framework for the algorithmic detection of autocatalytic networks within the original model designed by Kauffman [7].

Following [7] an ACS is a subset of chemicals which production is catalysed by at least one other member of the subset. In spite of their important differences, most models describe a phase transition such that, once certain conditions are satisfied, ACSs spontaneously emerge. It is important to notice that the presence of autocatalytic cycles may lead to a significant departure of the concentrations of the elements belonging to the cycle from the expected one. These out-of-equilibrium concentrations are in principle amenable to experimental analysis.

In particular, Kauffman, through the analysis of the structural properties of the reaction graph [7] claims that, increasing the maximum length of the molecules present in the system, the number of possible reactions increases faster than the number of molecules: therefore, whatever is the probability for a reaction to be catalysed, it will be sufficient to feed the system with the right number of different molecules to observe the emergence of an autocatalytic set. The kinetics was explicitly introduced within the model, through a set of ordinary differential equations, by Farmer and coworkers [36, 37]. They observed that once that the average connectivity of the reactions graph is greater than 1, i.e. supra-critical, cycles of all length begin to form. These results were confirmed also in the model by Bagley [38, 39] in which all the molecules characterised by concentrations that are below a certain threshold are removed from the graph and new molecules are introduced within the system according to the reaction scheme.

Nevertheless, despite the theoretical predictions, observing ACSs in wet experiments is an indeed difficult task. It is possible that the simplifying assumptions at the basis of the in-silico experiments are unrealistic with respect to the actual features of the real world or, on the contrary, that the experiments lack the correct features indicated by the theory and hence the necessary conditions for the emergence of ACSs are not met.

As progress beyond the state of the art, in this work we introduce a novel model for the study of the emergence of autocatalytic cycles that is explicitly intended to fill the gap between theoretical predictions and experimental findings and provide useful insight for further experimentations.

In our model two kind of reactions are taken into account, as firstly proposed by Kauffman [7]: namely cleavage and condensation, whereby one polymer is divided into two short polymers in the former case and two polymers are concatenated forming a longer polymer in the latter case. Each reaction must be catalysed by another species in the system to occur and one of the assumptions is that any chemical species has an independent probability to catalyse a randomly chosen reaction.

With respect to the original model, the main novelties are the following.

First, the model is stochastic and is aimed to specifically consider the influence of the molecules present with few copies and the importance of noise and random fluctuations on the overall dynamics.

Second, the model is capable to increase in complexity as new entities can be introduced in the system by either condensation or cleavage of the existing elements.

Third, we introduce a temporal threshold which allows to exclude those reaction that do not occur at least twice within the specific time frame when analysing the dynamical evolution of the system: in this way it is possible to clearly detect cycles in a asynchronous stochastic system and to neglect the structural influence of very rare reactions.

Finally, particular attention is devoted to those cases in which the system is close to instability points (i.e. critical systems) and those in which the total amount of some species is particularly low.

In this work we are not interested in investigating the nature of the specific chemicals represented in the system, nor in the particular interactions among them. We are rather interested in the description and characterisation of the dynamical behaviour that emerges from the interaction of a whole set of interacting chemicals and in the detection of possible generic properties of this kind of systems.

Description of the model

The basic entities of the model are linear chains, species from now on, oriented from left to right, composed of symbols or letters from a finite alphabet (e.g. A, B, C...; G, A, T, C). We refer to species composed of single symbols also as "monomers" and to species composed of more than one letter as "polymers". Following the formalism introduced by Steel [40]X stands for the entire set of species, while each single species will be denoted by x i , i = 1... N. The amount (quantity) of species x i in the reaction vessel (equal to its concentration times the reactor volume) will be denoted by . In the rest of the work we will make use either of amount or concentration in accordance with the specific context; nevertheless since the two quantities are proportional the relation is straightforward.

The dynamics of the system is ruled, as in the original Kauffman model, by two different reactions, namely condensation and cleavage. By means of the former two species are concatenated together forming a longer species (e.g. AB + BAABBA) whereas by means of the latter a longer species is cut to create two shorter species (e.g. ABBBA + BBB). Notice that, with regard to the condensation, there could be two equal products starting from two different substrates (e.g. AA + BAAB and A + ABAAB), and on the other hand, with regard to the cleavage, there could be two (or more) different products starting from the same substrate (e.g. BAABBAA + B and BAABBA + AB).

In order to keep the system far from the equilibrium all the reactions are simulated in a well-stirred chemostat, in presence of continuous ingoing and outgoing fluxes. In our model we assume that the Gibbs energy ΔG for any reaction of the system is sufficiently large to neglect the presence of backward reactions in absence of a suitable energy source.

According to the number and the length of the species present in the reactor, we define R as the set of all the conceivable reactions and the cardinality of R at a given time is given by

(1)

Where N is the cardinality of X and L(x i ) is the length of the i-th species. The first term of eq. 1 represents the number of conceivable cleavages whereas the second term indicates the number of conceivable condensations.

Since the analysis is focused on the behaviour of catalytic reactions networks, we assume that the activation energy for any reaction is sufficiently high so that the rate of the spontaneous reactions is negligible with respect to that of the catalysed reactions: thus, uncatalysed reactions are not taken into account in the model.

Finally we assume the same catalytic rate for all catalysts so that all reactions are speeded up to the same extent. In the present version this model neglects any catalysis provided by elements other than species belonging to the system, even though environmental catalysts are thought to have played a relevant role in prebiotic synthesis [41]. For a given realisation, the subset R pos R containing all the reactions of R having at least one catalyst x cat X is created.

A very important issue concerns the determination of the probability that a given species catalyses a certain reaction. For simplicity, in models like this, which aim at uncovering generic properties, a single parameter p is used, which represents the probability that a randomly chosen species catalyses a randomly chosen reaction.

In regard to this key parameter we quote two extreme points-of-view. The first is that proposed by Kauffman [7, 36], which hypothesises that each specie has a constant and identical probability to catalyse a randomly chosen reaction, irrespectively of the number of different species. Conversely, Lifson [42] assumes that each species can catalyse a finite number of reactions; in this case the increase in the number of species would not be correlated to an increase in the total number of catalysed reactions, which reaches a constant value. In the first case systems characterised by sufficient heterogeneity will necessarily display the emergence of at least one ACS, as shown by Kauffman [7], whereas in the second case ACSs are not predictable. Steel and other authors [35, 40] demonstrated that ACSs can emerge also in conditions which are somehow intermediate between those hypothesised by Kaufmann and those by Lifson. In real systems specialised and fine-tuned catalysts coexist with less specialised ones [43]. It is therefore very difficult to hypothesise a general law for the scaling of the probability p with the number of species N. For the purpose of the present work we adopt the simplest alternative and we assume that p is constant.

The relationship between the occurrence of ACSs and the initial average connectivity, i.e. the ratio between the number of reactions and the number of species, is shown in Figure 1. In the subsequent simulations we chose an average connectivity equal to 1, corresponding to a reaction probability equal to 10-3; it is noteworthy that this value should favour the appearance of ACSs [36], nevertheless we observe that ACSs tend to be fragile even under this permissive conditions.

Figure 1
figure 1

Influence of the average connectivity on the emergence of ACSs. The graph shows the influence of the average connectivity on the fraction of simulations showing the emergence of at least one ACS. On the x-axis different values of the average connectivity are shown while on the y-axis the fraction of simulations with at least one ACS is shown.

Virtual and actual firing disks

For each realisation of the model, it is necessary to define the initial population, the scheme of the possible reactions and the composition of the incoming flux.

In order to maintain a certain degree of generalisation and to explore different possible initial conditions the initial population is set up considering two different layers of species.

First, a population X v forming what we call the "virtual ring disk", which is composed of all the species up to a certain length M, is created. Note that the term virtual refers to the fact that at this stage we have not yet defined which species are actually present within the reactor. We define here a subset X a X v , forming what we define as the "actual ring disk", containing all the species which concentrations are greater than 0. X a is composed of all the species up to a certain length M a , with M a M, and sometimes of some species which lengths are greater than M a and smaller than M. The concentration of each species belonging to X a is chosen in accordance with a predefined distribution. A schematic representation of X v and X a is shown in Figure 2.

Figure 2
figure 2

The virtual and actual firing disks. A graphical representation of both the virtual ring disk X v and the actual ring disk X a . The dark grey zone represents the actual ring disk containing all the species with concentration different from 0, whereas the light grey zone represents all the species with initial concentration set to 0.

The subset composed of all the species present in the incoming flux will be called FX v .

The dynamics

Given that each reaction occurs in a well-stirred chemostat, the concentration of each species is assumed to be uniform all over the space. Such concentrations change in time by means of an asynchronous stochastic update based on the well-known Gillespie algorithm [2830], according to which at each step of the simulation a reaction is chosen (among all the possible ones) and the physical time of the reaction is computed. Thus, the system behaves according to the characteristic of the initial chemical population, the composition and rate of the incoming flux and the rate of the outgoing flux.

It is important to note that the two key reactions of the system are not treated at the same way. Whereas cleavage is a bi-molecular reaction, so that when the catalyst binds the substrate the reaction instantaneously occurs, condensation is a three-molecular reaction, and it needs two steps to occur: in the first step the catalyst binds the substrate forming a transient molecular complex, while in the second step the complex binds the second substrate releasing the condensed product and the unmodified catalyst. Notice that also the spontaneous dissociation of the complex is taken into account. Hence, we can summarise the reaction scheme as following:

• Cleavage: AB + CA + B + C

• Condensation: (whole reaction: A + B + CAB + C)

- Complex formation: A + CA : C

- Complex dissociation: A : CA + C

- Final condensation: A : C + BAB + C

where A and B are two generic substrates involved in a specific reaction, C is the specific catalyst for that reaction and A : C represents a transient complex, which is necessary for the condensation process to happen.

The outgoing flux is simulated assigning a common decay time k out to each species and molecular complex; the incoming flux rate K is measured in moles per second and the average residence time τ is given by 1/k out .

Although the model is based on a stochastic approach, in order to speed up the computation both the incoming and the outgoing flux are described through an approximated algorithm. The Gillespie algorithm allows to physically define a time interval Δt between two successive reactions, hence the number of molecules introduced by the influx at each Δt is given by K Δt and the number of molecules to wash out is where stands for the total concentration of the species actually present in the system at the time t: both the actual species to be introduced and washed out are chosen in accordance with the relative concentrations within the influx and the reactor composition respectively.

Considering that some species may totally vanish because of the internal dynamics, all the reactions in which the vanished species have been involved are however conserved, in order to maintain the consistency of the system (in the case that those species may reappear).

We here stress that the use of a particle-based algorithm allows the emergence of competition and inhibition phenomena. The former pertains to the impossibility for any individual molecule that acts as a catalyst to be involved in more than one reaction at a time when it is involved in a transient complex, whereas the latter occurs when, for instance, a reaction chain is interrupted because some elements of the chain are, temporarily or permanently, consumed by other reactions.

For ease of comprehension, in table 1 an example of dynamics is provided.

Table 1 Example of dynamics

The Reaction Graphs

Two distinct representations describing the same system and the same dynamics are indeed possible. The first one describes the catalytic activity of the system: an edge from x c to x i is drawn if species x c catalyses the production of the species x i .

The other representation, instead, describes the assembling activity: an edge from x j to x i is drawn when x j is a substrate involved in the production of x i .

Consider, for instance, the following three reactions occurred at times t(1), t(2) and t(3):

  1. 1.

    A + B + BBBB + AB

  2. 2.

    ABA + BBAB + A + BB

  3. 3.

    AA + ABAA + A + B

The two representations, including the overall scheme, are shown in Figure 3.

Figure 3
figure 3

Reactions graph representations. The cartoon of the two actual reaction graphs based on the two different representations is shown, catalyst → product and substrate → product. In addition the overall representation as in [7] is provided.

Furthermore, the choice of using an asynchronous update framework entails another problem, that of identifying which is the "appropriate" reaction graph of the system.

One first option is that of representing the reaction graph taking into account all the reactions that occurred at least once during the time of the simulation: this is what we call the "complete reaction graph", G c from now on [44]. Nevertheless, we must keep in mind that only one reaction at a time occurs and that some reactions occur very rarely.

The importance of consistently define cycles within an asynchronous framework induced us to introduce another graph in which each link is maintained if and only if the specific reaction occurs within a temporal window defined as W; otherwise, the link corresponding to that reaction is removed from the graph. This is another key novelty of the model and allows to neglect the influence of very rare reaction, i.e. those occurred in the past a very few times. The graph in which only the reactions that occur within the specific temporal window are considered is therefore called the "actual reaction graph", G a from now on. Clearly, the actual reaction graph changes in time according to the dynamical evolution of the system.

Lastly, we define as the "possible reaction graph", G p from now on, the graph in which all the possible reactions at a given time, and not only those that actually occur, are considered. Through the analysis of this graph we obtain indications about the nearest possible futures [45].

Comparing the influx diversity and the influx composition

In a previous work [46] we showed that, increasing the influx diversity in terms of different species, the chance of emergence of ACSs in the G a increases as well. However in that work the incoming flux contains all the species up to to a maximum length M in and it was therefore impossible to decouple the effects of the influx diversity (i.e. number of different species) from those due to the maximum length of the species belonging the incoming flux. In this section we analyse the two different effects.

One first set of experiments regards different ensembles of catalytic reaction networks, in which the composition of the incoming flux differs in the number of distinct species belonging to them, keeping K fixed.

We considered a first set of networks in which the influx is composed of all the species up to length 3; in the second set a randomly chosen 20% of all the possible species with length 4 is added to the previous set; the subsequent sets are composed of enlarging the set of length-4 species following this procedure, up to the case in which all the species up to length 4 are present in the incoming flux.

In Figure 4 and 5 we can observe the variation of the average number of molecules that do not belong to the influx as a function of time, in relation to the different ensembles of networks described above. We can notice that the average concentration tends to stabilise around asymptotic values which are different for the different ensembles (Figure 4) highlighting a clear order relation: those networks characterised by an influx composed of a larger fraction of species with length 4 seem to stabilise around higher average concentrations. This result is confirmed analysing the variation of the overall living species (defined as the species with concentration larger than 0) present in the reactor in time, in the different cases (Figure 5).

Figure 4
figure 4

Different influx compositions (molecules). The graph shows the average number of molecules not belonging to the influx in relation to the different ensembles in function of time. The error bar represents the standard error.

Figure 5
figure 5

Different influx compositions (Species with concentration greater than 0). The graph shows the average number of species with concentration greater than 0 in relation to the different ensembles in function of time. The error bar represents the standard error.

In Figure 6 we can observe the variation of the number of ACSs that emerge from the dynamics in the different cases, whereas in Figure 7 we focus on the variation in time of the species belonging to the ACSs. Also in this case we can notice that, in correspondence of those ensembles characterised by an influx that is composed of a larger fraction of species of length 4, both the probability of observing the emergence of an ACS and the total number of species belonging to ACSs are higher.

Figure 6
figure 6

Different influx compositions (Number of ACSs). The graph shows the average number of ACSs in function of time. The error bar represents the standard error.

Figure 7
figure 7

Different influx compositions (Species belonging to ACSs). The graph shows the average number of species belonging to the ACSs in function of time. The error bar represents the standard error.

In other words, the activity of the whole system seems to be enhanced in those cases in which the influx is formed by a larger number of longer species (all with the same maximum length), with special regard to the creation of ACSs.

To dissect the complementary effects due to the presence of a larger number of species, on the one hand, and of species with larger lengths, on the other hand, we performed a second series of simulations in which we compared networks characterised by influxes composed of the same number of species and molecules, whereas the distribution of their length differs. In particular, we initialised networks with a ring disk in which all the species up to length 4 are present, while the influx is composed of a fixed number of species equal to the number of possible species up to length 3, i.e. 14; in the first ensemble in the influx there are all the species up to length 3, while in the second ensemble the influx is composed of all the species of length 1 and 2 and the remaining species are chosen with uniform probability distribution among the species of length 3 and 4.

Given that the difference among the curves regarding the variation of the average concentration of the molecules that do not belong to the initial ring disk seems not to be relevant (Figure 8), we could argue that the largest portion of the enhancement effect described above is due to the variation in the total number of species, the difference in their lengths playing a minor role.

Figure 8
figure 8

Different influx composition maintaining fixed the number of species. The graph represents the average number of molecules not belonging to the influx in function of time. The first curve represents the average behaviours of the experiments with an incoming flux composed of all the species up to length 3 whereas the second curve represents all the experiments having an incoming flux composed of all the species up to length 2 and the remaining 8 species randomly chosen from a uniform distribution containing all the species with length 3 and 4. The error bars represent the standard error.

One last important remark concerns the robustness of the emergent ACSs.

In [38] Bagley defines as "Autocatalytic Metabolism" (ACM) an ACS in which the concentrations of the composing elements make significant departures from the value they would have without catalysis. The results of the simulations seems to indicate that it is indeed difficult to observe an ACM and that, instead, most of the observed ACSs are not ACMs. In Figure 9 we can look, for instance, at the actual reaction graph concerning a particular experiment in which an ACS actually emerged, in three different moments of the simulation. We can see that some links forming the ACS are given by reactions occurring rarely within the temporal window, in one case only once. This outcome implicates that the probability that in the next temporal window that specific reaction does not occur is considerably high and this is what we observe in the graph corresponding to the successive time step. We would like to stress that this is not a peculiar case, rather it is a generic phenomenon. Almost all the ACSs are, in fact, characterised by at least one reaction that occurs rarely, representing therefore a bottleneck and hinting to a serious lack of robustness.

Figure 9
figure 9

Autocatalytic sets fragility. In the figure the actual reaction graph at three different times of a particular experiment is shown. The green nodes represent the species belonging to the incoming flux, the blue nodes are those forming the ACS and the white nodes are the new species created by the dynamics.

Summary and Discussion

The first aim of this work is to introduce and formally describe a stochastic dynamical model of catalytic reaction network highlighting the main novelties with respect to the existing models and the motivations at the basis of its introduction.

The application of the Gillespie algorithm to this kind of system was motivated by the importance that noise, random fluctuation and low-numbers-effects have with respect to the overall dynamics, in particular in "critical" systems, i.e. those systems that show a behaviour close to the phase transition in which ACSs begin to emerge. The model is indeed suitable to identify the key parameters and variable responsible for the observed dynamical behaviours.

Here we stress once more that the formal definition of a cycle within a stochastic asynchronous model is not unique and needs the setting of a specific temporal thresholds to define how long an observed phenomenon (i.e. a specific reaction) must be kept in the memory, hence possibly becoming a part of a cycle. Within this framework, the integrated analysis of both the structural properties of the system and of the dynamics is of paramount importance.

The analyses on critical ensembles focused the attention on the composition of the incoming flux, while the composition of the initial firing disk seems to play a minor role in the overall dynamics.

One first important result showed that the differences in the number of distinct species in the incoming flux actually influences the overall activity of the system, with noteworthy effects on the emergence of ACSs as well. Given a fixed number of species entering the reactor every second (since K is fixed), the more different species are present (although in fewer copies, because the overall input concentration is kept fixed), the more enhanced the activity of the system would be. Conversely, the presence of longer polymers in the incoming flux does not significantly alter the dynamics of the system under study.

It is noteworthy that our results highlight a dynamical structural fragility of ACSs, due to the presence of rarely occurring reactions that prevent the autocatalytic closure over a reasonable time span.

This result could partially explain the difficulties encountered in detecting ACSs in wet-lab experiments. Nonetheless, considering the large parameters space of the system under investigation, further analyses are required to corroborate these findings.