The RNA world hypothesis assumes that the genetic information is encoded in an RNA template and via the complementarity principles transmitted to the replicons. The latter forms the tertiary structure, employing the same principles, responsible for molecular recognition and catalysis. Thus, the RNA world relies on the complementarity principle for both the replication and catalytic activity of RNA. However, the formation of the complementarity principle without prior evolution steps is doubtful (Koonin 2012). This and many other unresolved problems led to suggestions that RNA must be preceded by a more straightforward genetic material capable of continuous evolution into modern biochemistry (Anastasi et al. 2007; Joyce 2002; Orgel 2004). Several attempts have been made to design such molecules, but most of them represent double-helical polymers using the same complementarity principle (Orgel 2004). Besides, none of them is much simpler than RNA and introduces additional problems of "genetic takeover" or displacement of these genetic materials by RNA. Can a stable tertiary structure, capable of molecular recognition and catalysis, be produced without the complementarity or any other preexisting rules? To address this question, I consider here likelihood of the de novo appearance of nucleic acid quadruplexes on primitive Earth. The properties of the quadruplex world's main structural domain, G3N (G3NG3NG3NG3; where N is any base), are discussed. Next, I consider the formation of self-reproducing tetrahelices and the role of the temperature (heating/cooling) cycles in this process. Finally, I propose that the quadruplex world addresses most of the deficiencies of the RNA world hypothesis.

Properties of G3N Quadruplex

G3N quadruplex is Highly Programmable

The term "programmable" in primordial evolution describes the ability of specific sequences to fold into the designed/programmed tertiary structures capable of specific interactions with other molecules or of performing some catalytic activity (DeLuca et al. 2020; Seeman 2010). The DNA duplex's programable nature is based not only on high specificity of canonical Watson–Crick base pairs, but also on the strictly determined antiparallel alignment of strands. Thus, the tertiary structure should have strictly defined base recognition and strand alignment rules to be programmable.

G-tetrads (G•G•G•G) are square planar arrangements of guanines; each G directly interacts with adjacent Gs using both Watson–Crick and Hoogsteen bonding interfaces. In addition, each G has indirect (through cation) contact with the diagonally positioned G (Fig. 1A). It is not surprising that other nucleotides are incapable of such sophisticated interactions and other homo or mixed tetrads (i.e. A•A•A•A, G•C•G•C, or G•G•G•T) strongly destabilize quadruplexes (Mergny and Sen 2019). Despite these strictly defined recognition/association rules, G-tetrads can adjust to any strand alignment (i.e., parallel, antiparallel and hybrid topologies) (Mergny and Sen 2019). The structural polymorphism is so strong that the topology of many monomolecular quadruplexes can depend on experimental conditions or even the history of sample preparation. One of the few monomolecular quadruplexes that demonstrates strict monomorphism is G3N, which contains all-parallel G-tracts and double chain-reversal (propeller) single nucleotide loops (Fig. 1B-D) (Kelley et al. 2011).

Fig. 1
figure 1

A G-tetrad with cation (red) in the center. B Nucleotide sequence of G3N (N is any base). C Three-dimensional (3D) representation of G3N quadruplex with all parallel G3 segments (red spheres) and the chain-reversal loops (black spheres); blue discs represent G-tetrads. D Schematic representation of G3N quadruplex with all parallel G-tracts (red) and chain-reversal T-loops (black curved lines)

G3T Quadruplex is Unusually Stable

The G3T (G3TG3TG3TG3) quadruplex is a truncated version of a DNA aptamer (G3T)4, selected for binding to HIV-1 integrase (Jing and Hogan 1998). Most of the initial studies performed using the quadruplex with T-loops, G3T, revealed that G3T is the most stable known structural motif in the entire nucleic acid world. For instance, even in the presence of 0.1 mM K+ (the most efficient quadruplex forming cation) it demonstrates cooperative and fully reversible melting curves at 55 °C and in 50 mM K+ it melts near 100 °C (Kankia et al. 2016; Kelley et al. 2011). The remarkable stability of the G3T quadruplex is attributed to the all-parallel alignment of GGG-tracts and the single-nucleotide propeller loops that are responsible for its structural monomorphism (Do et al. 2011; Kelley et al. 2011). Any modification of the G3T sequence, besides nucleotide substitutions in loop positions, is accompanied by strong destabilization and structural polymorphism (Kankia 2018).

RNA Analogue of G3T has the Same Topology

While DNA and RNA form double helices with significantly different geometry (B vs A-conformations), G3T and its RNA analogue adopt precisely the same tertiary structure (Joachimi et al. 2009; Kankia 2019). The RNA quadruplex is 13° C more stable than G3T (Joachimi et al. 2009; Kankia 2019). However, not all DNA-to-RNA substitutions increase stability. Depending on the position, the substitution can either have no measurable effect, stabilize or even destabilize the structure; the effects vary between -1.5 °C and 3.5 °C per substitution (Kankia 2019).

Tetrahelical Monomolecular DNA (tmDNA)

The tmDNA is a homopolymer consisting of n number of G3N domains, poly(G3N) or (G3N)n (Kankia 2014). The terminal G3-segments of adjacent G3T domains form G6-segments (Fig. 2A). Each G3N domain of the architecture is formed by zigzagging of G3-segments and N-loops, while the G6 segment sheared by adjacent domains and serves as a bridge between them (Fig. 2B-D). The G6-segments are responsible for the structure's vertical growth, while G3-segments and T-loops move DNA strands horizontally. The tmDNA folds rapidly and demonstrates even higher thermal stability than the G3N domain (Kankia 2014). The RNA analog of tmDNA, tmRNA, is capable of forming the same architecture, and all properties discussed below can extrapolate to RNA (Kankia 2019).

Fig. 2
figure 2

A Nucleotide sequence of (G3N)2 (N is any base). B 3D representation of (G3N)2 quadruplex with all parallel G-tracts (red spheres) and the chain-reversal loops (black spheres); blue discs represent G-tetrads. C Surface of the 3D model of (G3N)2 unwrapped into 2D map. D Schematic representation of (G3N)2 quadruplex with all parallel G-tracts (red) and chain-reversal T-loops (black curved lines)

Polyguanine Folds into G3G-domain tmDNA Architecture

The tmDNA architecture can adopt only the G3N building pattern (GGG-tracts with single nucleotide loops); any other pattern (e.g., G2N (G2NG2NG2NG2) or G4N2 (G4N2G4N2G4N2G4)) is incapable of forming a stable, uninterrupted tmDNA (Kankia 2018). As a result, polyguanine adopts a tmDNA structure using only the G3G building pattern (Kankia 2018). Thus, a non-specific polyguanine homopolymer can form a sophisticated tertiary structure with a strictly defined and highly programable building pattern.

tmDNA is Highly Reproducible and Predictable

As shown earlier, G15, (G3T)2, or (G3T)7GGG do not show any misfolding after rapid cooling on ice (Kankia et al. 2016). The latter construct is a variant of (G3T)2 with T-insertion between G3T domains, G3T-T-G3T. Since it contains eight identical G3-segments, it is more inclined to misfold. For instance, it might form only one G3T domain in the middle of the construct with unstructured tails, i.e., G3TG3T-(G3T)-TG3TG3. However, it forms two perfectly folded G3T quadruplexes even upon rapid annealing on ice (Kankia 2018). Structural reversibility upon rapid annealing is uncommon for RNA, which requires careful annealing to restore initial structures.

tmDNA structures formed from sequences with A, T, or C loops can be programmed and predicted with 100% accuracy. The structure of polyguanine can also be predicted with 100% accuracy if its length equals n × 15 (i.e., G30, G45). If this is not the case, one can still predict number of G3G domains in tmDNA, without precise positioning of the ends of the structure. For instance, G32 might fold into GG-(G3G)2, G-(G3G)2-G or (G3G)2-GG. However, G-hydrolysis at the loop positions will increase the predictability to 100% (see "Exposed Bases"). This kind of structure predictability is unusual for biopolymers. For instance, only short DNA duplexes with specifically designed Watson–Crick base pairs can be predicted with 100% accuracy, while the highest accuracy for proteins is around 80% (Heffernan et al. 2015).

Exposed Bases

tmDNA includes fully exposed nucleic acid bases similar to the Pauling model of DNA (Pauling and Corey 1953). The exposed bases create specific advantages: (i) they could be used for intermolecular base-pairing without unfolding the structure; (ii) since they are completely exposed to solvent, the loop bases can easily hydrolyze from the sugar-phosphate backbone. This would transform polyguanine into G3 and G6 segments separated by abasic sites. As a result, the structure predictability will increase to 100% accuracy since the abasic sites represent the most favorable chain-reversal loops (Rachwal et al. 2007); (iii) the abasic sites can be used for incorporation of other bases (e.g., cytosine) without affecting the homochirality of the polymer (see "Origin of Homochirality").

High-affinity Interactions between tmDNA Molecules

The tmDNA architecture is capable of high-affinity intermolecular binding through quadruplex-and-Mg2+ connections (QMCs) (Kankia 2015). QMC is based on the shape complementarity and strong stacking interaction between the quadruplexes. Figure 3A shows a pair of QMC connectors: Sub A (blue) or (G3T)2, missing the last guanine at the 3'-end; and Sub B (red), a G3T monomer with one extra guanine at the 5’-end. Upon mixing, the substrates interact with each other and form the uninterrupted G3T trimer with one interface G-quartet interface, which serves as an interlocking mechanism and secures proper folding of the product G3T trimer. QMCs are characterized by attomolar binding affinity and dissociates only after removing excess ionic strength (i.e., removal of Mg2+ by EDTA) (Kankia 2015).

Fig. 3
figure 3

A Design and B schematic of self-ligating (two-component) QMC system. C Ligation of the two-component system at 22 °C in 50 mM KCl, 10 mM MgCl2, 10 mM Tris, pH 8.8 visualized on 12% polyacrylamide denaturing gel stained with SYBR Gold. D Model of three-component QMC system using G-triplex (green) as a quadruplex catalyst

Catalytic Activity of tmDNA

The terminal guanines of Sub A and Sub B in the formed QMC complex (Fig. 3A-B) are juxtaposed, allowing non-enzymatic ligation in the presence of a carbodiimide phosphate activating agent (e.g., EDC) (Kankia 2021). The ligation is extraordinarily robust and efficient: it can proceed in boiling solutions and complete ligation is achieved within a minute. To demonstrate the spontaneous ligation of quadruplexes the two-component QMC reaction (see Fig. 3A-B) was conducted in the aqueous solution in the absence EDC. The visible product formation is observed in 2 weeks (Fig. 3C). While the ligation of RNA in the absence of the condensing agents is a well-established phenomenon, it is uncommon for DNA. The present result is likely attributed to unusually high stability of the QMC complexes. A similarly high ligation efficiency is achieved using a quadruplex catalyst (i.e., G3TG3TG3 or G11) (see Fig. 3D). This system employs the QMC partners without a locking mechanism and is templated through the quadruplex catalyst, which demonstrates turnover activity upon temperature cycling (Kankia 2021).

Self-polymerization of Guanines

Template-free Production of Quadruplexes

Figure 4 shows possible polymerization of polyguanines from free G-monomers. Feasibility of step 1 is based on experimental studies revealing that (i) free guanosines or GMPs form G-tetrads stacked on each other with a similar helical parameters as quadruplexes (Chen et al. 2020; Wu and Kwan 2009; Zimmerman 1976); and (ii) 3’,5’-cyclic GMP demonstrates non-enzymatic self-polymerization with canonical 5’-3’ phosphodiester backbone formation (Costanzo et al. 2009; Morasch et al. 2014; Pino et al. 2008). The feasibility of steps 2 and 3 is demonstrated by ligation of pG10 in the presence of EDC, which results in polymers with lengths of at least a few thousand nucleotides (Kankia 2021). Feasibility of steps 5 and 6 discussed in "High-affinity Interactions between tmDNA Molecules" and "Catalytic Activity of tmDNA".

Fig. 4
figure 4

Steps in possible prebiotic polymerization of quadruplexes. Only G-tetrads (gray disks) and phosphodiester backbone (arrows) are shown. Curved lines (reactions 4–6) correspond to propeller loops formed by single Gs

The scheme shown in Fig. 4 employs the temperature cycles at steps 2 and 4. In step 2, temperature cycling facilitates the rearrangement of short multimolecular quadruplexes into the G-wires, and in step 4, longer multimolecular quadruplexes are transformed into monomolecular tmDNA. The importance of the temperature cycles in primordial evolution is discussed in the following section.

Role of Temperature Cycles in Quadruplex Formation

In primordial Earth, the formation of biologically essential molecules was driven by fluctuations of physicochemical parameters (Salditt et al. 2020). For example, production of RNA molecules in the RNA world involves at least two steps: (i) replication of the template; and (ii) dissociation of the replicon to allow the next cycle of replication. Thus, after each replication cycle, the RNA duplex must be denatured/unfolded. This could have been achieved by the temperature cycles produced by day/night fluctuations. In the RNA world, these cycles can also have adverse effects: (i) RNA can easily misfold upon temperature cycling (Tinoco and Bustamante 1999); and (ii) if the replication session is longer than the temperature cycle, incomplete/nonfunctional RNA will be produced. In contrast, the temperature cycles fit perfectly into the quadruplex world since tmDNA does not misfold or have any length limitation. Moreover, the temperature cycles create additional opportunities in the quadruplex world (see the following sections).

The Temperature Cycles Can Overpower Detrimental Hydrolysis

One of the main barriers to abiotic nucleic acid polymerization/condensation (with release of water) is the opposite reaction, hydrolysis. In a water solution, an unstructured nucleic acid ultimately hydrolyzes to its components. To produce a polymer, the rate of condensation should be faster than the rate of hydrolysis. In other words, hydrolysis can be overpowered by increasing the rate of condensation. In the absence of a catalyst, this can be achieved by temperature elevation that decreases the activation energy barrier of the condensation. While temperature elevation would dissociate/diffuse nucleotide aggregates and unfold RNA, G-tetrads can survive high temperatures (Smith et al. 2018). Thus, the high thermal stability of G-tetrads and quadruplexes shifts the hydrolysis-condensation equilibrium towards condensation. In addition, wet-dry cycles, which can be a consequence of the heating/cooling cycles, further accelerate condensation by removing water molecules, including newly condensed ones (Deamer and Weber 2010). Thus, the quadruplex world can be easily powered by the most prominent and available energy, heat.

The Temperature Cycles Can Serve as an Enrichment Tool for Quadruplexes

Since the quadruplexes based self-recognition of guanines, G-oligomers of equal length usually can slip against each other and form multimolecular long G-wires (Marsh and Henderson 1994; Marsh et al. 1995; Protozanova and Macgregor 20011996, 2000) (Fig. 4). This feature, which is a nuisance in quadruplexes' programmable programming/design, could be beneficial during abiogenesis when combined with the heating/cooling cycles. For instance, it could serve to purify quadruplexes from impurities such as other nucleotides or L-G enantiomers. This could play a critical role in breaking the chiral symmetry of nucleic acids (see "Origin of Homochirality").

Shortcomings of RNA World Hypothesis

Polymerization of the Very First Polynucleotide

There are two main barriers to polymerization of the RNA template: hydrolyses (Deamer and Weber 2010; Runnels et al. 2018) and cyclization (Cairns-Smith 1977; 2008; Ferris et al. 1996; Horowitz et al. 2010). Both barriers are removed in the quadruplex world. The hydrolysis problem is discussed in "The Temperature Cycles Can Overpower Detrimental Hydrolysis".

Specific experimental conditions have been created to overcome the problem of cyclization, albeit with limited success. For instance, polymerization on mineral surfaces (Cairns-Smith 1977, 2008; Ferris et al. 1996) or polymerization in the presence of intercalators (Horowitz et al. 2010) allows the formation of RNA strands up to 100-nt in length. As shown in Fig. 4, the quadruplex world easily overcomes the problem: pG10 assembles nanowires that, upon EDC-activated ligation, form polymers of thousands of nucleotides in length (Kanki 2021). The ligation is very efficient, fast, and can be conducted even at the boiling point.

Origin of Homochirality

Another problem of the RNA world is enantiomeric cross-inhibition, whereby template-directed polymerization involving one enantiomer of RNA is inhibited by the presence of the other enantiomer (Joyce et al. 1987, 1984). While chiral selectivity of the replication was striking (poly(D-C)-directed oligomerization of D-G was far more efficient than of L-G), in reactions with a racemic mixture, monomers of the opposite handedness to the template, L-G, strongly inhibit polymerization by incorporating as chain terminators (Joyce et al. 1984). It must be noted that in addition to the chimeric products, a significant amount of short oligo(D-G)s were found even after seven days of reaction, but it was suggested that "after extensive oligomerization, nearly all chains would have an L-G residue at the 2'(3') terminus" (Joyce et al. 1984).

The quadruplex world can employ temperature cycles to break chiral symmetry. It was shown that G-monomers (nucleosides and GMPs) are capable of forming G-tetrad stacks with the helical properties of right-handed quadruplexes (30° rotation and 3.4 Å rise) (Chen et al. 2020; Wu and Kwan 2009; Zimmerman 1976). It must be emphasized that these helical stacks are not constrained by a covalently bonded backbone and they are free to form both left- and right-handed structures, but they favor the right-handed helical stacks. Therefore, adjacent D-Gs in the G-tetrad stacks can ligate to each other through the 5'-3' phosphodiester backbone with thermodynamically favorable anti-glycosidic bonds. The L-G monomers incorporated in the right-handed helix need to adopt syn conformation to have 5'-3' orientation (Joyce et al. 1984). However, this conformation is significantly less favorable for DNA quadruplexes and almost impossible for RNA quadruplexes (Fay et al. 2017). Therefore, L-G monomers would terminate polymerization by (i) not ligating to oligo(D-G)s or (ii) ligating and capping them at the 3' terminus (see chimeric oligos in Fig. 5). Thus, the system would contain some amount of oligo(D-G)s with proper backbone linkages, which can be dissociated from the mixed stacks at the heating step and reassembled into the thermodynamically more favorable homogenous G-wires at the cooling step (Fig. 5).

Fig. 5
figure 5

Purification of quadruplexes by the heating/cooling cycles (T-cycles). Separation of oligo(D-G)s (blue arrows) from chimeric oligos (containing both D-Gs and L-Gs) (blue lines with red squares) and formation of right-handed G-wires followed by ligation

RNA-to-DNA Transformation

The RNA world hypothesis postulates that life was initiated by RNA, which later was transformed into a DNA/RNA world by step-by-step incorporation of deoxynucleotides into RNA duplexes. However, RNA and DNA double helices are significantly different (A- vs. B-conformation), which raises significant thermodynamic penalties upon RNA-to-DNA and DNA-to-RNA transformations (Gavette et al. 2016). In contrast, the DNA and RNA quadruplexes fold into exactly the same tertiary structure, making the transformations in both directions free from significant thermodynamic penalties (see "RNA Analogue of G3T has the Same Topology") (Kankia 2019; Zhou et al. 2017).

The Quadruplex World and Origin of the Genetic Code

One significant deficiency of the RNA world hypothesis is that it represents an isolated system incapable of evolving into modern day ribosome-based decoding and translation (Bowman et al. 2015; Koonin 2012; Orgel 2003). It is also unclear how the genetic code emerged and evolved in the RNA world.

The continuity principle assumes incremental expansion of a system while preserving earlier versions of the system and their functionality. According to this principle, earlier versions of the genetic code should be preserved in the present universal code (Hartman 1975; Jukes 1974; Kocherlakota and Acland 1982). The most prominent earlier version is the fully degenerated part of the universal code in which the type of nucleotides in the third position of base triplets does not play any role. There are eight amino acids (Gly, Ala, Pro, Arg, Ser, Val, Leu and Thr) involved in this code. Remarkably, all amino acids, except positively charged Arg, are the smallest and simplest, and the most frequent nucleotides in the meaningful first and second codon positions, are C and G bases (44% C, 31% G, 19% U and only 6% A). Based on the simplicity of the amino acids and the nucleotide frequency, the step-by-step evolution of the degenerated code was suggested earlier (Hartman 1975; Jukes 1974; Kocherlakota and Acland 1982), which is consistent with the quadruplex world hypothesis.

The code initiates as the interaction between polyglycine and polyguanine (Hartman 1975). Glycine is not just the simplest amino acids having a single hydrogen atom as its side chain, but it is the only achiral amino acid, which eliminates the possibility of enantiomeric cross-inhibition upon polymerization (see "Origin of Homochirality"). Glycine could self-polymerize, fold, and then interact with tmDNA, or it could use tmDNA as a scaffold for polymerization. The polymerization might take place at the surface of the structured tetrahelix or at exposed bases of single-stranded polyguanines. Interestingly, glycine appears to destroy stacks of G-tetrads, while alanine has almost no effect (Detellier and Laszlo 1980). This supports the specific interaction of glycine with guanine bases at the atomic groups involved in G-tetrad formation. Thus, I propose that non-hereditary polyguanine and achiral polyglycine coexisted at the initial stage of the molecular evolution and represented the first nucleic acid-protein systems.

The next step of the code expansion involved adding cytosine and three chiral amino acids (Ala, Pro, and Arg) (Hartman 1975; Kocherlakota and Acland 1982) (Fig. 6). This was accompanied by introducing the G•C complementarity principle, and only at this point, the nucleic acids become the hereditary/genetic material. The code expansion initiated a transition from the quadruplex world to double-helical nucleic acids, which requires addressing the following questions.

  • First, what was the mechanism of cytosine incorporation into polyguanines and what was the sequence of the first nucleic acid template? The cytosine bases could have been incorporated into polyguanines by conjugating at the abasic sites formed by spontaneous depurination of loop nucleotides (see "Exposed Bases"). The resulting strand would have a specific sequence dictated by tmDNA structure: (G3CG3CG6C)n (see Fig. 7A). The replication of this sequence (probably in the presence of cations that are unfavorable for quadruplex formation, such as Li+ or Cs+) would produce the first double-helical polymer (Fig. 7A, bottom duplex). Another version of the first double helix could have been (G3C)n. As mentioned in "Role of Temperature Cycles in Quadruplex Formation", temperature cycles could result in untimely termination of the replication process. Since the sequence contains short repeating G3C units, some replicons of the previous cycles could bind to the template at various places and continue elongation during subsequent cycles (Fig. 7B). As a result, (C3G)n and (G3C)n sequences would produce long repetitive duplexes (Fig. 7B). Interestingly, the majority of telomeric sequences represent GGG segments separated by a few nucleotides and might represent vestiges of (G3C)n sequences. Frameshifting affects the polypeptide sequence encoded by (G3CG3CG6C), while it has no effect in the case of (C3G)n which is always translated into (Gly-Arg-Ala-Gly)n (Fig. 8).

  • Second, how the chiral homogeneity of quadruplexes is maintained in the context of GC-containing duplexes? As mentioned above, cytosine bases could have been inserted into all-G sequences by conjugating at already existing D-sugar-phosphate groups of polyguanines at abasic sites. However, the template-directed polymerization (see Fig. 7B) could proceed through association of free bases to the template followed by conjugation to D-sugar-phosphate chains produced by depurination of polyguanines.

  • Third, since amino acids Ala, Pro, and Arg are chiral, what is the mechanism that ensures incorporation of only L-amino acids in polypeptides? It was shown earlier that D-RNA prefers interaction with L-amino acids (Root-Bernstein 2007; Seligmann 2020).

Fig. 6
figure 6

Transition of the quadruplex world into GC-containing duplex system

Fig. 7
figure 7

A Formation of the GC-containing double helix with (G3CG3CG6C)n pattern from polyguanine through depurination of loop bases of tmDNA, cytosine incorporation and template-directed replication. B Formation of (G3C)n pattern from the (G3CG3CG6C)n sequence through partial replication interrupted by temperature cycles, primer relocations, strand cyclization and rolling circle replication (RCR). The G6 segments are shown in bold for clarity. Green color corresponds to the sequence synthesized following relocation due to the temperature cycle

Fig. 8
figure 8

Impact of frameshifting on the polypeptides produced by (G3CG3CG6C)n A and (G3C)n B