Journal of Molecular Modeling

, Volume 18, Issue 2, pp 607–609

An information-carrying and knowledge-producing molecular machine. A Monte-Carlo simulation

Open AccessOriginal Paper

DOI: 10.1007/s00894-011-1081-9

Cite this article as:
Kuhn, C. J Mol Model (2012) 18: 607. doi:10.1007/s00894-011-1081-9


The concept called Knowledge is a measure of the quality of genetically transferred information. Its usefulness is demonstrated quantitatively in a Monte-Carlo simulation on critical steps in a origin of life model. The model describes the origin of a bio-like genetic apparatus by a long sequence of physical-chemical steps: it starts with the presence of a self-replicating oligomer and a specifically structured environment in time and space that allow for the formation of aggregates such as assembler-hairpins-devices and, at a later stage, an assembler-hairpins-enzyme device—a first translation machine.


Assembler-hairpins-enzyme deviceEmergence and storage of informationGenetic apparatusKnowledgeMonte-CarloOrigin of lifeStructured time-space environment


Modeling the origin of life: postulates on initial conditions

This paper is based on the concept that living individuals—distinct aggregates of interlocking molecules—are a form of matter that carries information to be reproduced and to evolve in their given environment into forms of increasing complexity and intricacy [17].

A step-by-step Darwinian process appears instantaneous when the first entity emerges by chance (a de-novo oligomer that is capable of replication), survives as a species (population of individuals of a certain kind) and then evolves by continued reproduction with variation and selection in the particular environment driving the process. The initial step, which has been called "origin of life", requires a basic initial form for the building blocks as well as basic initial conditions for the environment (Supplementary Fig. 1). Such a step-by-step Darwinian process, continuous and sustained, leads first to aggregates of interlocking molecules, and then to the bio-genetic apparatus.

These aggregates reach increasing independence from the afore-mentioned highly specific initial conditions by gradually populating a diversified area, as follows: (1) populating regions of compartments with increasing size of pores, (2) evolving devices that produce envelopes as innate compartments, (3) evolving metabolism (increasing intricacy of the living machinery).

Modeling origin of life: the RNA world

We consider the case where only one type of complementary R-monomers, R1 and R2, is present, and we assume that de-novo oligomers emerge and are capable of replicating (Supplementary Fig. 2a). The initial sequence is random and, due to errors in the copying process, there is variation of this sequence. If by chance a strand is found in the population that has a sequence such that it folds onto itself to form a hairpin (Supplementary Fig. 2c), the hairpin will be selected because it is more protected against hydrolysis and thus has a higher probability of being selected. The hairpin finds an open strand to bind by the loop by complementary and anti-parallel means. If the number of adjacent hairpins equals or exceeds three, such an aggregate will be selected because it is, again, more protected against hydrolysis, and thus has a higher probability of being selected. A reading frame (Supplementary Fig. 2d) exists if all hairpins in the population have loops of one kind. All hairpins bound to the assembler are then adjacent (Supplementary Fig. 2e).

The early process (RNA-world [811]) comes to an end when replication reaches a certain precision. This apparent dead-end is overcome by the emergence of the first translation apparatus with an enzyme E1 as its product (i.e., the assembler-hairpins-enzyme device, the HAE1-device and its nonsensical replica, the HAE0-device as explained below).

Modeling the origin of life: the RNA–protein world

We introduce in our model two a-monomers, a1 and a2, (which may be glycine and alanine). Occasionally, by errors in the copying process, the two R-monomers at the open end (Supplementary Fig. 3a) of the hairpin are no longer complementary. The hairpin then carries an a-monomer such that R1–R1 carries a1 and R2–R2 carries a2 (Supplementary Fig. 3a). If all hairpins on an assembler carry a-monomers, these can then oligomerize. The product, an a-oligomer with random sequence, has no enzymatic power but functions as an agglutinate-forming envelope E0 (Supplementary Fig. 3b).

The HAE1-device is a translation apparatus (Supplementary Fig. 3c), and emerges as a by-product. It is selected only if (1) there is a code on the (+)-assembler for this specific a-oligomer, (2) this code is translated by the hairpins, and (3) the specific a-oligomer is an enzyme (enzyme E1) that increases the precision of the R-replication (replicase). The (−)-assembler constitutes the anti-parallel complimentary copy of an (+)-assembler, and its product therefore has no sense (HAE0-device).

Increasing contamination coupled with increasing complexity of the evolving system (i.e., by accumulation of HAE0-devices with an increasing number of different HAEi-devices) again leads to another barrier. This barrier is overcome by a fundamental change in the machinery into a primordial form of the translation apparatus (DNA–RNA–protein world [1216]). The emergence of a bio-genetic apparatus [7] then paves the way for the “explosion of life” [1719].

Computer implementation and simulation

Supplementary Fig. 4 presents the computer implementation [20, 21] in a flow chart overview format. The top part shows the construction phase, where the aggregates are formed. According to the fitness of the formed devices, parameters such as survival chance, replication probability and the probability of occurrence of an error in the replication, are assigned to these entities. Possible aggregates are the hairpin (Supplementary Fig. 2c), the HA-device (Supplementary Fig. 2e) and, after emergence of new kinds of monomers (a-monomers) that attach to the open ends of hairpins, the HAE0- (Supplementary Fig. 3a) and the HAE1- (Supplementary Fig. 3c) devices. In the selection phase (middle of Supplementary Fig. 4), only a fraction of the population survives (corresponding to half of the total number of strands). In the multiplication phase (bottom of Supplementary Fig. 4), the aggregates dissociate and the strands are copied by chance until the total number of strands is replenished. Two typical simulations starting with a single strand of random sequence are shown in Supplementary Fig. 5a,b.

Quality of information Knowledge demonstrated by a Monte Carlo study

Defining quantities that measure Information and Knowledge elucidates some principle aspects of the origin of life. Genetic information is stored in one entity and transferred from one generation to the next as a distinct sequence of monomers in a strand. It is measured in bits [22].

"The quality of genetic information" denotes that the genetic information has the property to instruct the formation of an entity that behaves as if it had knowledge, that is, it behaves as if it would know how to survive and to multiply in its given environment.

Knowledge, K, is related to the effort required to develop a certain degree of functionality. Let us consider the total information I(g) (number of bits at generation g) that has to be discarded by eliminating unfit individuals along a singlet out trial of a Darwinian evolution that takes place within a population of entities from the initial replicating oligomer at generation 0, continued by many step-by-step Darwinian processes, until the given degree of functionality is reached at generation g. The average gbt of the repeated process (Monte Carlo method) levels out random fluctuation in g. The effort required to develop the given degree of functionality is measured by the number K = I(gbt) called Knowledge K [3, 7, 23].

Beginning with a situation in which the emergence of a first replicating strand is possible, it takes a huge number of trials (i.e., discarding much information) until a replicable strand actually appears (Knowledge K growing suddenly from zero to a value K0 given by the sum of discarded bits (see schematic illustration in Supplementary Fig. 6a, b). Then, Knowledge K will stepwise increase, being constant (reflecting the gradual adaptation of the evolving forms to a given region by refinements of their organizational structure) until a major change in the organizational structure appears with a stepwise increase in Knowledge K (breakthrough process leading to the colonization of a new region). Supplementary Fig. 7a (analyzing the emergence of hairpins) and Supplementary Fig. 7b (showing Knowledge K of the evolution from replicating strands to the HAE1-device) are evaluated from the statistics of the Monte Carlo study. Building up the HA-device with its reading frame requires many generations, whereas the HAE1-device emerges shortly after a-monomers appear that are able to link to the hairpin’s open end.


Critical steps in a model on the origin of a bio-like genetic apparatus have been demonstrated by a computer simulation and analyzed by Monte-Carlo method. Establishing a reading frame, from a state of randomly formed aggregates of hairpins to a state of a well-ordered HA-device required many generations of a step-by-step Darwinian process. Modeling the origin of a bio-like genetic apparatus—from the emergence, de-novo, of a self-replicating oligomer and a specifically structured environment in time and space that allows the formation of aggregates such as assembler-hairpins-devices and continuing evolution to increasing complexity, which ultimately leads to the assembler-hairpins-enzyme device—is important to understand the principles of the origin of life and to stimulate testing the origin of life model by experimental efforts. The concept of Knowledge is therefore a useful measure for the quality of information along Darwinian evolution.


I would like to express my gratitude to Hans Kuhn for his continued and sustained encouragement and Evelyne Bozzi for kindly copy editing the manuscript.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Supplementary material

894_2011_1081_Fig1_ESM.jpg (172 kb)
Supplementary Fig. 1

Postulates. Basic initial conditions that start a step-by-step Darwinian process and sustain Darwinian evolution. The continuous extension of the populated area in the highly diversified world, beginning with replicating oligomers, R, emerging at a very precise location, is the basic mechanism leading to the origin of life. A continuous drive towards increasing complexity is given by the necessity for the increasing sophistication required to populate decreasingly favorable regions. The conditions of the given postulates are fundamental requirements for emergence of a genetic apparatus as a sequence of physicochemical steps (JPG 172 kb)

894_2011_1081_MOESM1_ESM.tif (10 mb)
High resolution image file (TIF 9.98 mb)
894_2011_1081_Fig2_ESM.jpg (149 kb)
Supplementary Fig. 2

ae Entities in the RNA-world: strand, hairpin and hairpin-assembler (HA)-device. a Strand replication of R-strands. An R-oligomer serves as a template. Energy-rich R monomers bind to the template by complementary pairing occasionally with an error. The growing strand oligomerizes in an anti-parallel direction; this replicate is not a copy of the template. The replica of the replica is a true copy (aside from incorporated errors) of the template considered. b Diffusion of entities between compartments provided by the environment. c Hairpins and their aggregates. Open strands are the R-oligomers with random sequences of R-monomers R1 and R2. The formation of hairpins, i.e., folding structures with the largest possible pairing domain, requires replication, where template and replica strands are complementary and anti-parallel. An open strand serves as an assembler for aggregation of hairpins, with hairpin-loops being complementary. In the computer implementation, the hairpins and aggregates with ≥  three adjacent hairpins are assigned fitness scores of S = 2 and S = 4 (selection probability parameter, see Supplementary Fig. 4), being two and four times more likely to survive, respectively, than an open strand (S = 1). d, e HA-device and reading frame. Initially the eight possibilities of loop sequences (four pairs of hairpin-loops) are equally probable. A reading frame (green) exists only if all hairpins in the population have loops of one kind, i.e., one pair of hairpins. A compact aggregate, the HA-device with all its hairpins adjacent is easily formed when all hairpins in the population have loops of one kind collectivity forming the reading frame (reading frame 1 as defined in Supplementary Fig. 2d). HA-devices are assigned fitness S = 8 (Supplementary Fig. 4) (JPG 149 kb)

894_2011_1081_MOESM2_ESM.tif (8.5 mb)
High resolution image file (TIF 8.51 mb)
894_2011_1081_Fig3_ESM.jpg (209 kb)
Supplementary Fig. 3

ac Forms in the RNA–protein world: assembler-hairpins-enzyme (HAE)0-device and HAE1-device. a HAE0-device. Attachment of a-monomers to hairpins. Oligomerization of a-monomers by HAE0-device. It is assumed that a new kind of monomer (a-monomers) is present at this stage. In rare instances (by replication errors) the lower ends of hairpins open (from states R1–R2 or R2–R1 to states R1 R1 or R2 R2) and a-monomers attach (a1 attaches to R1 R1 and a2 to R2 R2, respectively). Four possibilities between the anti-codon and the attached a-monomers are a priori equally probable. Thus, a translation does not exist. When each hairpin in the HA-device carries an a-monomer, the a-monomers oligomerize to form an a-oligomer. The HAE0-device produces an agglutinate-forming envelope (an a-oligomer E0 with random sequence). b Agglutinate as envelope. HAE0-devices produce envelopes as self-made compartments. Division of the envelope occurs when the amount of HAE0-devices within the envelope rises above a critical value. Envelopes containing HAE0-devices are Darwinian entities and, being two times more probable to survive than HA-devices, are assigned the fitness S = 16 (Supplementary Fig. 4). c HAE1-device as a translation apparatus as a by-product. The code on the (+)-assembler is translated by hairpins that belong to one pair (code 1 by pair 1). The product is an enzyme, E1, that acts as both a replicase and a synthetase. Envelopes containing HAE1-devices and HAE0-devices are assigned the fitness M = 2 (replication probability parameter), being two times more probable to replicate and with less replication error probability W = w/3 (Supplementary Fig. 4) than envelopes with HAE0-devices alone (M = 1 and W = w). The replicate of the (+)-assembler, the (−)-assembler, leads to an a-oligomer, E0, of unspecific sequence with no catalytic activity, but which helps compartmentation by agglutination (JPG 209 kb)

894_2011_1081_MOESM3_ESM.tif (10.3 mb)
High resolution image file (TIF 10.2 mb)
894_2011_1081_Fig4_ESM.jpg (109 kb)
Supplementary Fig. 4

Computer implementation. Construction phase, selection phase and multiplication phase as successive parts of a step-by-step Darwinian process (one generation). Construction phase: searching for specific pairing conditions by trial and error; assigning fitness parameters to the aggregates thus formed (S selection probability parameter, M replication probability parameter, W replication error probability). Possible aggregates are the hairpin (Supplementary Fig. 2c), the HA-device (Supplementary Fig. 2e), the HAE0-device (Supplementary Fig. 3a) and the HAE1-device (Supplementary Fig. 3c). Selection phase: selection of entities surviving by chance according to fitness until the number of oligomers surviving is N/2, e.g., the entity of kind k numbered j survives if a random number falls into the range of Sk,j (where the total range of possible random numbers is the sum of Sp,q over all entities that were present before the selection). Multiplication phase: aggregate dissociation, strand replication, replication with chance of error according to fitness until the total number of oligomers present is N, e.g., the strand numbered i replicates if a random number falls into the range of fitness Mi (where the total range is the sum of Mi over all strands that were present before the multiplication) (JPG 109 kb)

894_2011_1081_MOESM4_ESM.tif (4 mb)
High resolution image file (TIF 4.01 mb)
894_2011_1081_Fig5_ESM.jpg (107 kb)
Supplementary Fig. 5

a,b Computer simulation. Total number of oligomers N = 384 after multiplication. Breakthrough steps at generation number g. a Hairpin (g = 190), HA-device with reading-frame 1 (g = 710), HAE0-device (g = 2950), HAE1-device with code 2 (g = 3040). b Hairpin (g = 310), HA-device with reading-frame 2 (g = 3850), HAE0-device (g = 4870), HAE1-device with code 2 (g = 4900) (JPG 106 kb)

894_2011_1081_MOESM5_ESM.tif (4.8 mb)
High resolution image file (TIF 4.82 mb)
894_2011_1081_Fig6_ESM.jpg (69 kb)
Supplementary Fig. 6

a,b Concept of Knowledge K. a Schematic. Knowledge K of an evolving system is the number of bits thrown away on average (leveling out random fluctuation of the single process by the Monte Carlo method of the repeated process) to reach a given functionality of its forms. Knowledge K is then constant until the next breakthrough step. Non-living state: K = 0. Living state: instantaneous onset and stepwise increase of K. To construct replicating oligomers with 11 monomers of arbitrary sequence (assuming a probability of 0.01 linking two monomers, and assuming a probability of 0.9 that the replicable form dies out in the critical phase) requires 1021 = 270 oligomers to be formed de-novo and discarded until a replicating oligomer occurs occasionally by chance. On average, 70 bits are discarded. K0 = 70 bits is the Knowledge of the replicating oligomer. bKnowledge K independent of size of population. Modeling each replication by “doubling the genetic material” and each selection by “discarding half of genetic material” requires the number of bits thrown away to be (N/2) m, where N is the number of strands after replication, and m is number of monomers per strand. The total number of bits thrown away, I(g) = g (N/2) m as a function of the number of generations, g, is a straight line. Consider two consecutive breakthroughs from the initial form to mutant 1 and to mutant 2. Knowledge K is given by K1 = I(g1) and K2 = I(g2). The increase of Knowledge ΔK = K2−K1 does not depend on the size of the population (small size S and large size L): mutant 2 appears earlier (gL2−gL1 < gS2−gS1) in the case of the larger population (NL > NS), with equal total genetic information (number of bits) thrown away (JPG 68.8 kb)

894_2011_1081_MOESM6_ESM.tif (4 mb)
High resolution image file (TIF 3.98 mb)
894_2011_1081_Fig7_ESM.jpg (74 kb)
Supplementary Fig. 7

a,bKnowledge K evaluated from the Monte Carlo study. a Hairpin formation: distribution to reach breakthrough-step. Hairpin formation: breakthrough step from open strands to hairpins. Probability p(g−1) – pg that a hairpin appears in generation g (red), where P = (1 – N n w / 2n) = 0.9925 is the probability that “no hairpin within one generation”, n = 10 is the number of complementary pairs forming the pairing domain of the hairpin, N = 384 is number of oligomers (open strands and hairpins), and w = 0.002 is the error probability per monomer in replication. Histogram (below) for number of generations, g, to reach the breakthrough step from open strands to hairpins obtained by Monte-Carlo computer simulations gmean = 190 and gmedian = 138). b Formation of entities, breakthrough-steps at g = gbt. I(g) is the total cumulative number of bits thrown away along the generations g. I(g) = g (N/2) m, where N = 384 is the number of strands after each multiplication process, m = 23 is the number of monomers per strand and N/2 is the number of strands thrown away in each selection process, thus I(g) is a linear function (black line). Knowledge K (red) is constant, until the next breakthrough step. Breakthrough steps averaged by the Monte-Carlo method, over many computer simulations; two runs singled out are shown in Supplementary Fig. 5a, b (with standard deviation): hairpin at \( {{\text{g}}_{\text{bt}}} = {138}\left( { = > {138}\pm {13}0} \right) \), HA-device at \( {{\text{g}}_{\text{bt}}} = 1671\left( { = > 138 + 1533\pm 1250} \right) \), HAE0-device at \( {{\text{g}}_{\text{bt}}} = 1782\left( { = > 1671 + 111\pm 30} \right) \), HAE1-device at \( {{\text{g}}_{\text{bt}}} = 2100\left( { = > 1782 + 318\pm 270} \right) \) (JPG 73.6 kb)

894_2011_1081_MOESM7_ESM.tif (4.1 mb)
High resolution image file (TIF 4.05 mb)

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Biomedical Optics Research Laboratory, Clinic of NeonatologyUniversity Hospital ZürichZurichSwitzerland