A world beyond double-helical nucleic acids: the structural diversity of tetra-stranded G-quadruplexes

Nucleic acids can adopt various secondary structures including double-, triple-, and tetra-stranded helices that differ by the specific hydrogen bond mediated pairing pattern between their nucleobase constituents. Whereas double-helical DNA relies on Watson–Crick base pairing to play a prominent role in storing genetic information, G-quadruplexes are tetra-stranded structures that are formed by the association of guanine bases from G-rich DNA and RNA sequences. During the last few decades, G-quadruplexes have attracted considerable interest after the realization that they form and exert regulatory functions in vivo. In addition, quadruplex architectures have also been recognized as versatile and powerful tools in a growing number of technological applications. To appreciate the astonishing structural diversity of these tetra-stranded structures and to give some insight into basic interactions that govern their folding, this article gives an overview of quadruplex structures and rules associated with the formation of different topologies. A brief discussion will also focus on nonconventional quadruplexes as well as on general principles when targeting quadruplexes with ligands.


Introduction
The basic structural unit of G-quadruplexes was reported as early as 1962 by Gellert, Lipsett, and Davis when they investigated the structure of fibers obtained from a gel of concentrated guanylic acid (GMP) [1]. Their studies pointed to a planar hydrogen-bonded arrangement of four guanine (G) bases related to each other by an axis of fourfold symmetry (Fig. 1a). With a total of eight hydrogen bonds, such an arrangement is expected to be particularly stable and also allows for the formation of linear aggregates upon the stacking of tetrads driven by strong interactions of their large planar surface areas. Yet, the propensity of DNA or RNA oligonucleotides rich in guanine bases for the formation of corresponding G-quadruplexes with a core of stacked guanine tetrads became 1 3 25 Page 2 of 9 apparent only about three decades later when a growing number of high-resolution structures became available.
Due to their potential formation under physiological conditions, studies on G-quadruplexes have surged in past years. However, their formation and existence in the genomes of eukaryotes but also prokaryotes and viruses have only recently been confirmed [2,3]. Various biostatistical analyses have estimated between 10 5 and 10 6 putative quadruplex-forming sequences in the human genome [4,5]. In addition to telomeric sequences, these are frequently found in promoter regions of oncogenic genes. There is little doubt that G-quadruplexes participate and contribute to many physiological processes and also pathological conditions [6,7]. These include the control of gene expression, the maintenance of genome integrity, cancer progression, and degenerative disorders, making G-quadruplexes attractive targets for therapeutic interventions [8][9][10]. In addition to their biological function and their significance for medicinal and pharmaceutical strategies, G-quadruplexes have increasingly been employed in various bio-and nanotechnological applications [11][12][13]. Thus, nucleic acid aptamers and DNAzymes are often based on a G-quadruplex scaffold. Also, various sensor systems for metabolites or metal ions, electronic switches, and nanostructures increasingly rely on the specific properties of G-quadruplexes. Clearly, the conformational variability and unique features of these non-canonical nucleic acids hold a lot of promise for their future exploitation.

G-quadruplex architecture
A G-tetrad or G-quartet with its square planar arrangement of four guanine bases constitutes the basic structural unit of a G-quadruplex. The Gs are held together by eight Hoogsteen hydrogen bonds involving imino and amino hydrogen bond donors at their Watson-Crick edge as well as O6 and N7 hydrogen bond acceptors at their Hoogsteen edge (Fig. 1a). Depending on a clockwise or anticlockwise orientation of hydrogen bonds when going from donor to acceptor sites, two different tetrad polarities can be distinguished. Typical quadruplexes consist of a G-core comprising two to four G-quartets that are stacked on each other in a helical arrangement; however, three-layered quadruplexes largely predominate (Fig. 1b, c). In such a stacked columnar arrangement, monovalent cations, in particular potassium or sodium ions, are coordinated within the central channel of the G-core that is lined with the guanine carbonyl oxygen atoms. Because of the negative potential within the inner channel, uptake of cations is essential for G-quadruplex stabilization. In general, due to their size and lower free energy of dehydration, potassium ions show a higher stabilizing effect when compared with smaller sodium ions.
Each of the four columns of the G-core comprising two to four guanine bases constitutes a run of consecutive guanosine residues in an oligonucleotide sequence. These G-tracts may either be located on four individual strands, pairwise combined in two strands with a single intervening sequence, or situated within a single strand separated by three intervening sequences to form a tetramolecular, bimolecular, or monomolecular G-quadruplex structure, respectively. In general, G residues within a G-column may be in an anti glycosidic conformation or, as a result of rotating the base around the glycosidic bond toward the sugar moiety, in a syn conformation. However, G-tetrad formation requires all guanine bases of the tetrad to be oriented the same way. Thus, with all G-tracts pointing into the same direction, G nucleotides participating in the tetrad arrangement must adopt the same glycosidic conformation (Fig. 2a). Although both all-anti and all-syn tetrads with opposite tetrad polarity are conceivable, all-syn tetrads formed by rotating all four guanine bases are significantly disfavored and only observed in rare cases as a consequence of small syn-anti conformational energy differences of the G nucleotide. By the same reasoning, residues of the same tetrad but located in antiparallel G-tracts, i.e., with opposite 5′-to-3′ orientation, must adopt different glycosidic torsion angles to conserve the hydrogen-bonded G-tetrad alignment (Fig. 2b).
In contrast to the constraints imposed on the relative glycosidic conformation of G residues in each single G-quartet, the pattern of glycosidic bond angles along a G-column is less restricted. These sequential glycosidic conformations determine the polarity of stacked tetrads, resulting in homopolar or heteropolar stacking interactions (Fig. 3). Note, however, that glycosidic conformations along one G-tract determine corresponding conformations of the other three G-tracts in response to their parallel or antiparallel orientation. Because stacking of anti-anti or syn-anti steps is favored over the stacking of anti-syn and syn-syn steps, the preferred syn/anti pattern along the four G-tracts tends to minimize the number of disfavored anti-syn and syn-syn steps [14].
Finally, tetra-stranded quadruplexes feature four grooves separated by the sugar-phosphate backbone of the four G-columns (Fig. 2). These are of equal medium width in the case of four parallel G-columns with the same glycosidic conformation for residues in the same G-tetrad. For an antiparallel G-tract alignment, mixed syn/anti conformations of residues comprising a tetrad will generate different groove dimensions with additional narrow and wide grooves. Following hydrogen bonds from donor to acceptor, residues in an antiG → antiG or synG → synG alignment will form a medium groove, whereas antiG → synG and synG → antiG alignments will give rise to wide and narrow grooves, respectively.

G-quadruplex topologies
Compared with tetramolecular and bimolecular structures, G-quadruplexes formed upon the folding of a single G-rich oligonucleotide have received most attention due to their high relevance not only in biological systems but also in the design of non-natural quadruplex architectures. Typical quadruplex-forming sequences comprise four G-tracts of ≥ 3 consecutive G residues separated by short intervening sequences N x . Consequently, conservative search algorithms have been based on a consensus sequence motif d(G 3+ N 1-7 G 3+ N 1-7 G 3+ N 1-7 G 3+ ) for predicting putative quadruplex structures in genomic DNA [15]. Apparently, the four G-columns constituting the quadruplex core are connected by the three intervening sequences that either link two adjacent parallel strands, two adjacent antiparallel strands, or two non-adjacent antiparallel strands to form a propeller (double-chain reversal) loop, a lateral (edge-wise) loop, and a diagonal loop, respectively (Fig. 4). Geometric Fig. 2 a G-tetrad with all guanosines in an anti glycosidic torsion angle conformation and all G-columns I-IV with the same 5′-3′ backbone orientation (+, running from top to bottom). b G-tetrad with a flipped antiparallel column III (−, running from bottom to top) with its G residue adopting a syn conformation to maintain hydrogen bonding (red). The four grooves are either narrow, medium, or wide, depending on a syn-anti, anti-anti (syn-syn), or anti-syn pattern of adjacent residues within the G-quartet when going from H-bond donor to H-bond acceptor considerations demand different minimal lengths for the loop-forming segments with typically only a single nucleotide for a most stable propeller loop in a three-tetrad quadruplex, more than one or two nucleotides in the case of lateral loops when bridging a narrow or wide groove, and more than three nucleotides for a diagonal loop bridging distal edges of a G-tetrad.
In theory, there are 3 3 = 27 possible loop combinations in the presence of three loops, resulting in quadruplexes of different topologies. Additional topologies derive from a clockwise (+) or anticlockwise (−) progression of propeller and lateral loops in a common frame of reference with the 5′-terminus placed at the lower right corner. It easily becomes apparent that only a fraction of the theoretical folds can be realized for geometric reasons, e.g., topologies with two consecutive diagonal loops are clearly impossible as are two consecutive lateral loops running clockwise and counter-clockwise. For characterizing and discriminating the various folds, a simple descriptor composed of the type and progression of consecutive loops may be used [16,17]. Thus, abbreviating lateral, propeller, and diagonal loops with "l," "p," and "d," respectively, the designation (+ld −p) describes a quadruplex topology with a first lateral loop at the 5′-end running clockwise, a central diagonal loop, and a third propeller loop at the 3′-end running counter-clockwise. Notably, whereas extended geometric descriptors can be used, the simple description of quadruplex topologies as introduced above does not include information on the number of G-tetrads and the geometry of grooves, nor does it discriminate among different patterns of glycosidic bond angles along the individual G-columns.
There are three major families of conventional G-quadruplexes (Fig. 5). A parallel topology as almost exclusively observed for RNA quadruplexes is based on four G-tracts oriented in the same direction. Here, all G residues of the G-core adopt the same glycosidic conformation, being anti in nearly all cases. G-quadruplexes of a (3 + 1) hybrid-type comprise one antiparallel and three parallel G-columns, Note that sequential glycosidic conformations along one G-tract determine conformations of the other three G-columns for given strand polarities. Based on the number of (dis)favored steps, the conformation in b tends to be preferred. Cyclic arrows indicate tetrad polarities, whereas gray and red rectangles represent anti and syn G residues, respectively Fig. 4 General sequence of an intramolecular G-quadruplex (bottom) and schematic representation of its folding into a three-layered G-quadruplex (n = 3) with a topology described by (−pd +l); the first intervening sequence L1 forms a propeller loop running counterclockwise (−p, red) whereas L2 forms a diagonal (d, green) and L3 a lateral loop running clockwise (+l, blue) frequently realized by (−p −l −l) and (−l −l −p) topologies termed hybrid-1 and hybrid-2, respectively. Finally, quadruplexes with an antiparallel topology feature pairs of parallel and antiparallel G-columns. Here, a G-column may have two adjacent antiparallel G-tracts as in the chairlike (+l +l +l) topology or both a parallel and an antiparallel neighboring column as in the basket-type (−ld + l) topology. Fig. 5 a Schematic representation of different G-quadruplex topologies and their notation for loop progression. Note that the two antiparallel chair-type folds (−l −l −l) and (+l +l +l) are topologically different. b Representative three-dimensional solution structures. Three-tetrad parallel quadruplex (−p −p −p) with three propeller loops (left, core residues shown in red, pdb ID: 1XAV); two-tetrad chair-type quadruplex (+l +l +l) with three lateral loops (center, core residues shown in green, pdb ID: 2LYG); three-tetrad basket-type quadruplex (−ld +l) with two lateral and one diagonal loop (right, core residues shown in yellow, pdb ID: 143D)

Stability and polymorphism of G-quadruplexes
From an energetic point of view, the thermodynamic stability of G-quadruplexes mostly derives from the stacking of tetrads with the coordination of cations. However, loop domains as well as 5′-and 3′-flanking sequences may significantly contribute to the quadruplex stability by tertiary interactions, often forming an additional capping structure on an outer G-tetrad. Capping structures or a dimerization by end-stacking are frequently observed phenomena and particularly important for adding stability to quadruplex scaffolds comprising only two tetrad layers [18].
In addition to overhang sequences, the type of coordinated cation as well as dehydrating effects from molecular crowding may significantly influence the G-quadruplex stability but also the folding pathway of a G-rich sequence. Compared with sodium ions, potassium ions are not only more stabilizing but also tend to promote folding into parallel topologies. Typically, parallel quadruplexes with short one-nucleotide propeller loops exhibit a considerable thermal stability, with melting temperatures often exceeding 90 °C in 120 mM K + solutions. Various quadruplex topologies are often close in their thermodynamic stability and coexist in solution. Consequently, small changes in the flanking sequence or in outer conditions can easily shift folding into another major topology.
Being a most prominent example, the human telomeric sequence d[GGG(TTA GGG ) 3 ] may adopt five different intramolecular G-quadruplex topologies under different experimental conditions (Fig. 6). Comprising a 5′-A flanking residue, it adopts an antiparallel basket-type G-quadruplex with a core of three stacked G-tetrads in a Na + solution. With two pairs of anti-syn-anti and syn-anti-syn G-columns, it only features heteropolar tetrad stacking interactions. In a K + -containing crystal, the same sequence folds into a parallel G-quadruplex with three propeller loops, allanti G residues, and exclusive homopolar tetrad stacking. Being in a K + solution, two (3 + 1) hybrid conformations, namely (3 + 1) hybrid-1 and (3 + 1) hybrid-2 have been identified for d[TAGGG(TTA GGG ) 3 ] and d[TAGGG(TTA GGG ) 3 TT], respectively. Both structures have one propeller and two lateral loops but differ in their loop arrangement. With three syn-anti-anti and one syn-syn-anti G-column, tetrads interact through homopolar as well as heteropolar stacking. Finally, an antiparallel basket-type G-quadruplex termed hybrid Form 3 comprising only two G-tetrad layers of opposite polarity and with a third G of the four G-runs shifted into loop segments was formed by the human telomeric sequence with a 3′-T flanking residue in K + solution.
Here, extensive base pairing and stacking in the loops capping the G-tetrad core contribute to the ability of this twolayered structure to successfully compete with a three-tetrad architecture [19].

Nonconventional G-quadruplexes with broken G-columns
The structural diversity of G-quadruplexes further increases due to their ability to compensate for guanine deficiencies and to fold into nonconventional quadruplex structures with broken G-columns comprising noncontiguous Gs [20]. Thus, intramolecular three-tetrad quadruplexes may even form from sequences lacking four tracts of ≥ 3 consecutive guanosines, contradicting the typical consensus sequence. With a single G-run comprising only two G residues, a vacant site in an outer tetrad of the folded species may be occupied by an external guanine derivative present in solution. Filling of an empty G position can also be realized by an unusual backbone progression. Such a fold will depend on the particular G-rich sequence and result in quadruplexes with an interrupted G-tract. Short bulges protruding within a G-column are often tolerated in case of a split G-run, albeit compromising the quadruplex stability. In addition to the formation of intertwined or interlocked dimeric and multimeric structures, other unusual structural motifs allowing for stable quadruplex formation in the presence of a shortened G-tract include snapback and V-shaped loops (Fig. 7). Clearly, this variability in arriving at a thermodynamically stable fold combined with additional stabilizing interactions of loop and overhang residues represents a challenge for the sequence-based design and prediction of favored G-quadruplex topologies.

Quadruplex-ligand interactions
To date, a plethora of quadruplex-binding ligands have been designed and tested, also including macrocyclic and metalloorganic compounds [21,22]. Interest in quadruplex binders was greatly stimulated by their potential use as novel anticancer agents. Such use is mostly based on their ability to induce and stabilize quadruplexes at G-rich promoter regions of oncogenes and at telomeric ends, interfering with gene expression through the cellular transcription machinery and with telomerase-mediated telomere elongation in cancer cells. In fact, some quadruplex-binding ligands have been shown to be effective in targeting quadruplex-forming regions in vivo and to inhibit cancer cell growth. The quadruplex interacting drug quarfloxin is the first candidate to have entered phase II clinical trials for the treatment of neuroendocrine carcinomas [23].
A major issue when developing quadruplex ligands is their specificity and discriminating power for quadruplex structures in the presence of predominating double-helical DNA. Thus, the majority of quadruplex ligands features a flat polycyclic ring system with a large aromatic surface area to optimize stacking on an outer G-quartet of the quadruplex. As a consequence, binding largely depends on strong π-π stacking interactions. Intercalation between tetrads as frequently observed with duplex DNA seems to be disfavored due to the energetic penalty for unwinding a tetra-stranded structure and for expulsion of the centrally coordinated metal ions. Overhang sequences, often forming a binding pocket, and additional interactions between side chains attached to the ligand aromatic core with the quadruplex backbone, loops, or inside quadruplex grooves may enforce binding and open up possibilities to selectively target particular quadruplex structures with minimal off-target effects (Fig. 8). Thus, quadruplex grooves with different geometry support more selectivity in binding, but in contrast to duplex DNA, only few ligands were demonstrated to exclusively interact within a quadruplex groove. Ongoing attempts for achieving better selectivity among quadruplexes by adding more specific interactions in addition to tetrad stacking are challenging but will ultimately offer great promise for in vivo applications of corresponding ligands.

Conclusions
G-quadruplexes have become one of the most intensely studied nucleic acid structures over the past two decades. Their existence across all organisms and biological role in vivo have sparked interest for these remarkable structures in biology as well as in medicine and pharmacology. On the other hand, G-quadruplexes have been shown to adopt an amazing number of different topologies, and their specific properties make them promising tools for various technological applications. We are currently seeing a growing understanding of critical interactions within G-quadruplex architectures. Numerous structural and thermodynamic studies on quadruplexes have yielded geometric formalisms for a description of their folds, and empirical rules have emerged for the propensity of G-rich sequences to favor a particular conformer in a given environment. Current efforts aim at a rational sequence-based design of quadruplex topologies and of high-affinity ligands interacting with quadruplexes through highly specific interactions. As a result, investigations on G-quadruplexes are anticipated to continue playing a growing role in many diverse areas of basic and applied science.
Funding Open Access funding enabled and organized by Projekt DEAL.

Conflict of interest
The authors declare no conflict of interest.
Data availability Not applicable for that section.
Author contributions Not applicable for that section.
Code availability Not applicable for that section.
Ethics approval Not applicable for that section.

Consent to participate Not applicable for that section.
Consent for publication Not applicable for that section.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.