Background

The origin of life is a scientific problem not outside the experimental realm.

The scientific consensus holds that life on this planet emerged through a process of gradual, incremental, chemical evolution from non-living matter, a process known as abiogenesis [1, 2].

In the last decades, a great effort has been done to reconstruct, at least partially, the physical-chemical steps leading to the origin of life on our planet. A definitive proof of concept would be provided by the ex novo assembly of a living system - that is, a membrane-bound structure characterized by self-replication and self-maintenance, capable of Darwinian evolution [3, 4].

The eventual synthetic pathway would also help answer questions concerning the relative importance of chance and necessity in the historical origin of life on earth [5, 6].

Ideally, an experimental approach should start from the simplest allegedly prebiotic chemical precursors. Recently, however, an alternative approach, called "semi-synthetic", has been proposed. This seems not only well focused to provide the proof of principle that living forms might indeed emerge from the self-organization of their molecular components, but also appears experimentally feasible [4, 7, 8].

Liposomes, as closed spherical bilayers, are the most obvious candidates as cell compartments. Their key properties, subject to ongoing research, range from autopoiesis, or self-assembly from amphiphilic precursors in water [911], to their ability to incorporate various molecules, to their growth and division at this expenses of free precursors [1217].

In order to build liposome-based minimal cells, it is necessary to encapsulate the minimal number of molecules such as proteins and nucleic acids in order to accomplish the essential process typical of cellular life. Current research focuses on the synthesis of functional proteins inside liposomes. The translation "module", in fact, is one of the most complex and important for cellular metabolism. Therefore, its implementation in liposome-based systems not only allows the production of proteins (which, in turn, act as catalysts or structural components of the synthetic cell, extending the repertoire of the available functions), but also well models the complexity of living cells.

Several protein expression systems have been successfully encapsulated inside liposomes [1822] (for a review, see [8]). So far, these systems have been used to express the Green Fluorescent Protein (GFP), an easily detectable protein, and few other enzymes. The final goal of this approach is the intraliposomal expression of enzymes that ultimately catalyze the regeneration of all internalized and membrane-forming components, so that a coordinated "core-and-shell" self-reproduction can be achieved [4, 23].

Our focus was on those approaches where a particular translation apparatus called the PURE system is encapsulated inside lipid vesicles (see Figure 1).

Figure 1
figure 1

The PURE system: main metabolic blocks. A schematic drawing of the metabolic pathways included in the PURE system and hosted in a functionalized liposome (reproduced from [42] with permission from Springer, and from [35] with permission from Elsevier).

The PURE system (Protein synthesis Using Recombinant Elements) was developed by Ueda and colleagues [24] and is now commercially available. It comprises about 80 different macromolecular species (plus amino acids, nucleotides, etc.), representing the minimal set of the E. coli translational machinery (see Table 1). Thirty-six of these macromolecular species have been purified individually by Ni/His6 affinity chromatography after overexpression in E. coli cells. Ribosomes are specifically isolated by sucrose-density-gradient centrifugation. By mixing in proper ratios the isolated compounds, the E. coli translation activity is reconstituted effectively [24] and can be used to synthesize a protein starting from an encoding DNA (or RNA) sequence. As evident in Figure 1, the PURE system is indeed a modular tool that recapitulates in vitro the following metabolic processes: (i) aminoacylation of tRNAs; (ii) translation (initiation, elongation, termination); (iii) regeneration of the energetic species (ATP, GTP); and, optionally, (iv) transcription.

Table 1 PURE system components*

Protein-expressing vesicles have been created by encapsulating the PURE system components inside liposomes [2530]. Water-soluble proteins as GFP, Qβ-replicase, β-galactosidase, have been effectively expressed inside lipid vesicles of diverse lipid compositions, sizes and morphologies prepared by a variety of methods. Membrane proteins have also been expressed in specially designed liposomes [28].

Experiments are usually done by preparing liposomes in an aqueous solution containing the PURE system. The dispersion of lipids in water can be carried out by different methods, for example by swelling preformed thin lipid films, or by rehydrating freeze-dried liposome membranes, or also by injecting a small aliquot of a lipid-containing alcoholic solution. Consequent to lipid dispersion in the aqueous phase, liposomes form spontaneously. It is during this self-organization step that the PURE system components, floating in solution, are entrapped in the liposome's inner space. Non-entrapped PURE system components are generally inhibited from reacting by the addition of EDTA, RNAses, or proteases, so that protein synthesis occurs only inside liposomes.

In order to produce a realistic in silico model of intraliposomal translation, both solute entrapment and protein synthesis must be simulated stochastically. In fact, it is evident that stochastic effects will affect the solute encapsulation efficiency especially in the case of small liposomes and low solute concentrations. It is generally accepted that the physics of water-soluble solute encapsulation follows a Poisson statistics, where the mean number μi of entrapped i-th solute (initially present in solution with a concentration Ci) in a vesicle of volume V is given by μi = NA Ci V (NA being the Avogadro number). Fluctuations around this mean value can be described by the Poisson distribution, and are responsible for the fact that the true outcome of the encapsulation step is actually a population of vesicles of different solute content (in terms of number of chemical species and their amount). Furthermore, even independently of this effect, protein synthesis can display a stochastic behaviour when occurring in small volumes with a low number of solutes, as is the case in attoliter (10-18 L) containers. This behaviour is dependent on compartment size, because translation consists of higher-than-first order reactions whose likelihood depends, inter alia, on reaction volume.

This scenario is further complicated by recent - and very intriguing - findings which suggest that the encapsulation might not follow the Poisson statistics, but could be ruled by a power law. In particular, based on the observed occurrence of GFP production in 200 nm (diameter) vesicles, it was supposed that macromolecules could be indeed captured by closing lipid membranes with much higher a probability than the one calculated according to the Poisson distribution. To test this hypothesis, a thorough investigation was conducted on the physics of solute encapsulation by cryo- transmission electron microscopy. By directly counting the actual number of molecules per vesicle, it has been shown that when the protein ferritin [31] and ribosomes [32] are encapsulated inside lipid vesicles, most of the formed vesicles are actually empty, while a small minority (0.1-1%) contains an unexplained high number of solute molecules. This observation is not compatible with a Poisson entrapment model, but it follows a power law distribution (i.e., a distribution where the "long tail" predicts the occurrence of events which are very distant from the average behaviour (Figure 2). While these observations are a matter of fact, an explanation is still missing. No experimental studies are available as yet to ascertain whether a power law rules the encapsulation of multiple solutes, as in the PURE system. Further, there are no hints about the existence of solute- or membrane-specific interactions that might cause a deviation from the Poisson distribution.

Figure 2
figure 2

Over-crowding effect. Distribution of entrapped ferritin experimentally measured by means of electronic microscopy and theoretically forecasted Poisson distribution: the main divergence is given by the long tail of non-zero probability (redrawn from [31]).

Here we would like to investigate, by an in silico approach, the possible expectations -in terms of measurable protein synthesis yield- that follow from a set of hypotheses which are currently under experimental and theoretical scrutiny.

By following an approach philosophically similar to that of a previous work [33], where a stochastic simulation was used to test the viability of candidate minimal genome organisms, we would like to test the efficiency of different entrapment models to generate liposomes able to perform a complete protein synthesis. To do this, a specific simulator has been used, called QDC [34], based on Gillespie SSA algorithm, and able to control experimental parameters in a metabolic experiment and to deal with very large and very small number of molecules and volumes.

We think such approach can help advance the understanding of the implications of different hypotheses on solutes entrapment process. A simulation approach, in fact, can conveniently overcome the technical limits which hinder the "wet" study of these micro-compartmentalized systems, and predict clear-cut experimental observables, so providing a means to discriminate between alternative models.

Results

Formal specification of the PURE system

By assuming the simplifications declared in the Methods section, we derive a formal specification of the PURE system composed by 106 reactions, that best describe the behaviour of the system. The complete list of each simulated chemical species, reaction and kinetic coefficient is reported in the Additional File 1 (written in the QDC input syntax); we report in the Table 2 a selection of the most important kinetic constants, along with a concise description of their derivation.

Table 2 Derivation of kinetic constants

The model was used to perform stochastic simulations of the PURE system in a set volume, representing the internal volume of a liposome. Varying the system volume between the simulations allowed us to study the impact of this variable on the system dynamics. Even though the PURE system is well known and commercially available, to the best of our knowledge this is the first work where its dynamic behaviour is studied by means of a stochastic simulation.

Simulating the protein synthesis in liposomes with the two entrapment models

Our primary target of study was translation efficiency. A measure of it was established as the time τ the system required in order to yield a fixed, arbitrary number of protein molecules per volume unit. We have computed this quantity for both entrapment models: the Poisson and the power-law, respectively.

τ (volume) for Poisson-distributed system species

For each simulated volume V, the mean number μi of each of the 24 system species present in the reaction compartment at time zero was calculated from the initial concentrations of PURE system species as reported by [35].

μi was set as the parameter defining the Poisson distribution for each i-th species, which was then used to generate 360 instances of the encapsulated system with stochastically distributed species.

The time course of the number of protein molecules was averaged instant-by-instant across all simulations, in order to cancel out stochastic effects not dependent on volume. The procedure was followed for each volume simulated, in a range from 10-13 to 10-18 L. Figure 3 graphically reports the result for a 10-16 L compartment (vesicle outer diameter ca. 580 nm). The linear-like time course is due to a good translational efficiency, where the molecular machinery works regularly and the chemical resources are still available in relatively large quantities.

Figure 3
figure 3

Time course of protein synthesis in a 10-16 L vesicle, Poisson model. In the abscissa there is the time (in sec), where the ordinate shows the number of generated proteins.

The data were used to derive the translation efficiency, as shown in Table 3.

Table 3 τ (volume) Poisson distribution

A peak of translation efficiency is revealed at around 3*10-16 L, corresponding to liposomes of approximately 840 nm (Figure 4).

Figure 4
figure 4

τ(volume), Poisson. In the abscissa is reported the vesicle volume (L); in the ordinate the time necessary to reach the synthesis threshold (in sec.); the orientation of this axis is reversed.

This maximum is reliable, as we repeated five runs of independent simulations, and its existence and position has been confirmed. The existence of this maximum is somewhat expected from a general consideration that sees the translational efficiency as the equilibrium between two opposite contributions. The former is given by pure kinetic effect: the smaller is the vesicle volume, the higher is the apparent kinetics of the chemical reactions of order higher than first. As almost all the reactions involved in the simulated PURE system are second order, their speed increases when the liposome size becomes smaller. The latter contribution derives from the Poisson entrapment process that makes highly probable that some necessary chemical species are missing when the liposome size becomes smaller. Therefore it is likely to expect that there is an optimal vesicle size that contains all the necessary chemical species in the minimum volume possible. Following our result, the translational efficiency decreases abruptly for vesicle smaller than the optimal size.

τ (volume) for power-law-distributed system species

For each simulated volume V, the corresponding "over-crowding concentration" cs was determined from the experimental data reported in [31] and graphically displayed in Figure 2. The number μs of each solute in an "overcrowded" vesicle was calculated as μs = cs NA V.

Solutes whose initial concentration according to [24] is higher than cs were not adjusted for the super-concentration effect, but left with their regular molecule numbers.

The frequency of the "overcrowded" vesicles was inferred from [31] and taken as between 0.1% - 1%, depending on super-concentration level, hence also on volume. So, the frequency was established as 1%, 0.2%, and 0.1% for vesicles of volume, respectively, 10-16, 5*10-17, and < = 10-17 L (corresponding, respectively to vesicle diameters of ca. 580, 465 and < = 275 nm). This is a good choice to highlight the properties of power-law distribution, by avoiding, at the same time, to face a complex multivariate study.

The "non-overcrowded" vesicles were assigned a mean number μ i of species i, calculated as: μ i = [c i NA V (Ls + Ln) - cs NA V Ls]/Ln , where c i is the regular concentration of species i as reported in [24] for the PURE system; Ls is the number of overcrowded vesicles; Ln is the number of non-overcrowded vesicles. The formula introduces a correction for the number of molecules already entrapped in the overcrowded vesicles, thus unavailable to be entrapped in the non-overcrowded liposomes.

For each volume V, in a range from 10-16 to 10-20 L (diameters from 580 to 35 nm), 1000 instances of the system were generated pursuant to the above-described procedure.

Contrary to the Poisson model, the system reaches a protein production plateau within the simulation time limit (Figure 5). This is due to the rapid consumption of amino acids and energetic species by the over-concentrated PURE system enzymes. Results are presented in Table 4. The plateau and the general poor yield of the system made impractical the use of the same measure of efficiency as used in the Poisson series. In accordance to our aim to provide a means to discriminate between the Poisson and the power-law statistics in the wet-lab, the time required to reach the plateau, and the number of proteins at plateau per overcrowded vesicle, are given instead.

Figure 5
figure 5

Time course of protein synthesis in a 10-17 L vesicle, power law model. In the abscissa there is the time (in sec), where the ordinate shows the number of generated proteins.

Table 4 τ (volume) power-law distribution

Conclusions

The in silico analysis of the translational efficiency of liposomes entrapping the PURE system reveals that there is a size threshold of 10-16 L below which the Poisson entrapment model is forecasted to give no detectable synthesis. The main experimental observable is therefore represented by the possible protein synthesis in liposomes of about 10-17 L volume (275 nm diam.) or less. By entrapping the PURE system in liposomes of this size, with an mRNA specifying for a GFP protein, the detection of a fluorescence signal will indicate that the real distribution of solutes in liposome follows a power-law statistics and not a Poisson process. This experimental validation could improve our understanding about the basic mechanism underlying the relationship between membrane compartments formation and solute distribution within them.

Methods

Deriving a formal specification for the PURE system

In order to specify a simulation-amenable model, the biochemical transformations carried out by the PURE system components was decomposed into elementary reactions (first or second order reactions), whose deterministic kinetic coefficients have been harvested from the literature, or, when unavailable, deduced by global considerations of thermodynamic and/or chemical nature or by imposing an inner coherence to the system. The detail level chosen to specify the system is the result of two complementary needs: to maintain as low as possible the computational cost (that is very high for SSA-based simulators) and to explicitly represent any significant entity present in the system. The resulting main simplifications adopted are: amino acids, tRNAs, aminoacyl-tRNA synthetases, and translation Release Factors are modelled each as a single species representing its own entire class (e.g., one "general tRNA" in lieu of the 56 actual tRNAs of the PURE system). These simplifying assumptions allow reducing the number of chemical species involved in the transcription/translation reactions down to 24. Furthermore, every round of translation is so designed as to yield a complete, 300-amino acids long protein (which models the GFP). In other words there is no abortive translation. This is in good accord with wet-lab empirical observations. Autoradiography of synthesized protein after electrophoretic separation typically reveals only one major protein band [28].

Perform a stochastic simulation of the biochemical species entrapped in the vesicles

Stochastic simulations have been performed thanks to QDC (Quick Direct-method controlled), a simulator based on the Gillespie's SSA algorithm (Direct-method version), described in the detail in [34]. The main reason to use such simulator is that it implements a series of checking tools that allowed us to verify the reliability of the obtained simulated data. In particular, it allows the use of very large/small quantities and it checks for possible overflow/underflow of variables that, given the volumes and concentrations used in our virtual experiments, are possible. Moreover, QDC outputs three files: one is containing the time course of the number of molecules of the various chemical species; one is containing the time course of the propensities of each chemical reaction, and the last file contains the effective firing rate of each reaction. By analysing these three files, it is possible to check for possible simulation artefacts or for possible system properties that can alter the simulation (e.g.: the possible stiffness of a system). We have deeply used these characteristics in order to rule out numerical errors or unsuitable descriptions of the biochemical system to be simulated.

Generating the entrapment models

The Poisson distribution model has been generated by means of a laboratory made script in Python language that simply computes the average value for the molecules in a vesicle, given the vesicle volume and the concentration of the species in the original solution. Then the script calls the Poisson subroutine (available in the statistic library of Python language) that outputs a Poisson-distributed variable.

The power-law entrapment model has been generated by a direct inference on the experimental data, as recalled in the Results and discussion section.