Introduction

Much of modern biology is founded on the understanding of DNA and RNA as essential means for storage and expression of genetic information. Beyond the classical double helix structure of DNA first reported by Watson and Crick in 1953 [1], alternate conformations of DNA have since been discovered which have important implications regarding biological function. However, the formation and properties of DNA structures are still comparatively poorly understood, and their role and extent in gene regulation is still debated in the literature. Furthermore, there is continually growing interest in the exploitation of supramolecular DNA assemblies beyond base-paired duplex structures, particularly for their potential use in gene regulation and anti-gene therapies. Therefore, recent efforts have been directed towards gaining understanding on a molecular level of the way in which biomolecules assemble with DNA to regulate biological activity.

One example of a higher order DNA structure is the triplex, which occurs when a third oligonucleotide (commonly termed the triplex-forming oligonucleotide, TFO) binds to the triplex target sequence (TTS) within the major groove of double-stranded DNA. The triplex structure is specified by Hoogsteen hydrogen bonding rather than classical Watson–Crick bonds in duplex DNA [2] (see Supplementary Figure 1). Due to the requirements of Hoogsteen bonding, a repeating purine (guanine and adenine, denoted R in standard DNA nomenclature) stretch is required in the TTS to form a DNA triplex structure. This is not an experimental observation, rather an intrinsic structural constraint. The triplex-forming oligonucleotide must in turn be either a complementary polypyrimidine (cytosine or thymine/uracil, denoted Y in standard DNA nomenclature) strand binding through reverse Hoogsteen base pairing in a parallel manner, or a polypurine (R) strand binding through Hoogsteen base pairing in an anti-parallel manner (Supplementary Figure 1). These two possible forms of triplex are represented as Y*R·Y and R*R·Y, respectively, where in this notation the asterisk indicates Hoogsteen base pairing between the central purine strand and the TFO and the dot represents Watson–Crick base pairing of the DNA duplex. A more detailed description of the nomenclature and possible triplex forms can be found in a recent review [3].

In 1987, Moser and Dervan were among the first to suggest the possibility of regulation of gene expression by triple-helix-forming DNA [4], and since this time a range of biological consequences and therapeutic applications have been proposed for DNA triplexes [5]. Friedreich’s ataxia (FRDA) is the only human disease known to be associated with DNA triplex formation. In the vast majority of cases (> 90%), the disease results from an expansion of a GAA trinucleotide repeat in intron 1 of the FXN gene which encodes for the protein frataxin, from approximately 15 copies in healthy individuals up to 700–800 copies in FRDA patients [6]. The expanded purine repeat in this region decreases expression at the mRNA level, indicating repression of transcription. Thus, inadequate frataxin protein is produced, leading to widespread dysfunction of iron–sulfur center containing enzymes, impaired iron metabolism, oxidative stress, and mitochondrial dysfunction, predominantly in the nervous system [7, 8].

Despite investigation both in vitro and in vivo, the mechanism by which the GAA expansion induces transcriptional inhibition in FRDA remains debated, with two non-exclusive models emerging involving either formation of non-B DNA conformation and/or a heterochromatin-mediated gene silencing [9]. Further uncertainty remains in the context of the first model regarding the exact composition and structure of the higher order DNA structures formed. In vitro and in bacterial plasmids, GAA repeats of lengths comparable to pathological FRDA alleles are proposed to form an intramolecular triplex of two GAAs and a CTT strand, with an unpaired CTT strand [9,10,11]. Furthermore, two R*R·Y triplexes can associate to form a novel DNA structure called “sticky DNA,” also demonstrated to inhibit transcription [12, 13]. Recently, a slightly different model has been proposed, with a hybrid RNA–DNA structure linked to polymerase arrest on templates that contain long GAA repeats [14]. Nevertheless, given the implication of triplexes in this disease, a potential therapeutic approach is the use of minor groove intercalators that could bind at triplex sites and selectively destabilize the TFO from the major groove. One such example is the polyamide netropsin, which has been reported to destabilize triplex while stabilizing duplex structure [15], predominantly binding in the minor groove opposing AT base pairs, which are rich in the FRDA sequence [16,17,18].

To better develop treatments for FRDA, a more fundamental understanding of these triplex structures, and their interactions with small molecules, is required. Over recent decades, electrospray ionization (ESI) mass spectrometry has emerged as a powerful tool in the investigation of biomolecular complexes. ESI, as a soft ionization technique, is able to transfer fragile non-covalent assemblies from solution-phase to gas-phase with rare fragmentations, and largely maintain physiologically relevant structural information [19]. More recently, ion mobility-mass spectrometry (IM-MS), which enables investigation of the size and shape of ions by measuring the collision cross section, has enabled structural models of biomolecular assemblies to be derived from gas-phase experimental constraints [20, 21]. Studies of DNA double-strand complexes using ESI-MS have been widely reported in the literature [22,23,24,25], and IM-MS has also found recent applications in this field [26, 27]. However, studies of DNA triplexes remain comparatively limited, with the first report of a triplex structure by ESI-MS in 2002 [28]. More recently, Arcella et al. demonstrated that triplexes in the gas-phase can maintain their solution structures [29].

In this work, we use ESI-IM-MS in combination with UV-vis spectroscopy to better understand the formation and structural properties of DNA triplex structures of relevance to FRDA. We consider various repeat lengths, RNA and DNA strand compositions, and the importance of pH and demonstrate that, contrary to other reports, Y*R·Y triplexes with an apparent propensity for self-aggregation are favorably formed in vitro, as are hybrid RNA–DNA structures.

Experimental

Reagents

Desalted nucleic acid oligomers were synthesized on the 1 μmole scale. DNA-based (GAA)n and (TTC)n were purchased from GeneWorks (Adelaide, Australia) in 27mer and 36mer lengths. 18mer DNA (GAA)6 and (TTC)6, along with (CTT)6 and (CUU)6 for the DNA and RNA TFOs, respectively, were purchased from Sigma (Castle Hill, Australia). Ammonium acetate buffers were prepared from a stock 7.5 M ammonium acetate solution and pH balanced with either acetic acid or ammonium hydroxide (≥ 99.99% purity) as necessary, all purchased from Sigma (Castle Hill, Australia). Netropsin (netropsin dihydrochloride) was purchased from Santa Cruz Biotechnology (Dallas, USA), extracted from Streptomyces netropsis (≥ 90% purity).

Triplex Formation

Reagents were purchased from Sigma (Castle Hill, Australia) unless otherwise specified. Nucleic acid duplexes and triplexes were formed by adding the constituent strands to the desired buffer such that the final concentration of each strand was 50 μM. Annealing was achieved by heating from 25 °C to 90 °C at a rate of 2 °C.min−1, incubating at 90 °C for 10 min, followed by cooling to 25 °C at a rate of 0.5 °C.min−1. Heating and cooling were performed in an AF-3 kiln (Woodrow, Australia). Prior to MS, samples were further desalted by dilution in buffer and re-concentration to approximately 100 μM by centrifugation in 10 kDa centrifugal filter units (Sartorius, Göttingen, Germany), as per the manufacturer’s protocols, before being diluted to 10 μM for analysis. Where indicated, MgAc2 was added at 10 M equivalents, and unbound salts were then removed by centrifugation as specified above, before reconstituting oligonucleotides to 10 μM. For the TFO competition experiment, an equimolar mixture of both DNA and RNA TFOs was added to a solution of preformed duplex and incubated at 4 °C for 24 h before preparation for MS analysis. Treatment with netropsin was performed by adding netropsin to a triplex sample at 0.5 M equivalents, followed by incubation at room temperature for 24 h. Unbound netropsin and associated contaminants were then removed by centrifugation using 10 kDa centrifugal filter units as specified above.

Ultraviolet–Visible Spectroscopy

Triplex samples were prepared as described above and diluted to 2 μM in equivalent buffer for analysis. All spectra were recorded on a Cary 5000 UV-Vis-NIR spectrophotometer (Agilent Technologies) with integrated temperature controller. At pH 7.4, spectra were acquired at a wavelength of 260 nm, while additional spectra were obtained at a wavelength of 300 nm at pH 5.5 [30]. Melting temperatures were determined by heating to 90 °C, incubating at 90 °C for 10 min, then cooling to 25 °C, at a rate of 0.5 °C.min−1 with absorbance recorded at every 0.1 °C increment. This melting/annealing cycle was repeated for each sample. Data were plotted as dA/dT vs temperature to observe melting/annealing events.

Ion Mobility-Mass Spectrometry

Ion mobility and mass spectra were recorded using an Agilent 6560 Ion Mobility Q-ToF Mass Spectrometer (Agilent Technologies). Data were collected in positive ion mode using nanoESI from platinum-coated borosilicate capillaries prepared in house. Parameters were optimized to maximize signal intensity while avoiding any analysis-induced structural transitions [31]. Typical instrument parameters included a capillary voltage of 1700 V, fragmentor voltage of 50 V, gas temperature of 75 °C, gas flow of 1.5 l/min, trap fill time of 10,000 μs, and trap release time of 2000 μs. CCS measurements were made using a multifield approach varying the IM drift tube voltage between 1200 and 1700 V. The acquired spectra were processed using Qualitative Analysis B.07.00 and IM-MS browser B.07.01 (both Agilent Technologies).

Theoretical Collision Cross Section Calculations

Molecular dynamic (MD) simulations were conducted for DNA triplexes with duplex sequence d(AAG)n and parallel pyrimidine TFO with sequence d(TTC+)n in the gas-phase (i.e., in the absence of solvent). For idealized structures, ionized forms were generated by protonation of all cytosines at the N3 position and all phosphate groups. To study the effect of increasing charge on CCS, the experimentally observed charge states were specifically generated for the 27mer system through a combination of TFO cytosine, TFO adenine (at the N1 position), and phosphate group protonation. For the 8+ charge state, all TFO cytosines and all but one phosphate group were protonated (that belonging to the central residue of the TFO). For the 9+ state, all TFO cytosines and all phosphate groups were protonated. Finally, for the 10+ state, all TFO cytosines and all phosphate groups were protonated, with the addition of an extra proton to a TFO adenine. Two different adenine positions were considered: a terminal residue and the central residue of the TFO.

The BSC0 modification to the AMBER parm99 force field [32] was used for all simulations. Parameters for N3-protonated cytosines and N1-protonated adenines were obtained from the literature [33, 34]. It should be noted that charges for protonated adenine have only been parameterized for RNA, rather than DNA; hence, only the charges on atoms of the nucleobase from this parameterization were used, with the sugar and phosphate groups remaining consistent with the BSC0 force field for DNA. The difference in charge was absorbed by the adenine N9 atom. The charges of the hydrogens of the protonated phosphates were taken from the BSC0 parameters for terminal phosphates with the adjoining oxygen modified such that the nucleoside remained neutral. Bonded parameters were obtained from the literature [35].

Simulations were run using NAMD [36] in the NVT ensemble at a temperature of 300 K. The temperature was maintained using a Langevin thermostat. No cutoff was used for either the Lennard-Jones or electrostatic interactions, and hydrogen-containing bonds were constrained using SHAKE [37]. Systems were initially energy minimized and then run with a time-step of 1 fs until the root-mean squared deviation (RMSD) of the atom positions from their initial values plateaued (between 500 and 650 ns, see Supplementary Figure 5). The CCS values of these structures were then calculated at 5 ns intervals using the trajectory method approximation within IMPACT [38], and the average value (each over 50 runs) is reported here. To account for the use of N2 as a drift gas, the radius of the gas probe was set to 1.68 (according to the ratio of effective radius of He and N2 reported elsewhere [39]).

Results and Discussion

Y*R·Y Triplex Structures Are Favored for Simple GAA·TTC Sequences

Triplex formation was investigated using simple oligonucleotide sequences of relevance to FRDA, containing the GAA triplet repeat (and complementary CTT sequence) in 6, 9, and 12 repeat lengths (referred to hereafter as 18mer, 27mer, and 36mer oligonucleotides). Attempts to form both R*R·Y- and Y*R·Y-type triplexes were first made by adding the three composite strands in 2:1 purine:pyrimidine or 1:2 purine:pyrimidine ratios, respectively, at pH 7.4. Interestingly under these conditions, triplex assemblies, as detected by native mass spectrometry, were principally only observed for the Y*R·Y type for all sequence lengths. The MS spectrum for the 18mer oligonucleotides is presented in Figure 1, which shows that in the case of excess pyrimidine strand, ions corresponding specifically to triplex assemblies are observed with charge states centered around 7+, in addition to the DNA duplex (Figure 1A). Also noted here are low-abundance signals corresponding to a species with mass twice that of the triplex, or in other words, a “triplex dimer.” In contrast, for the R*R·Y case (Figure 1B), only duplex and free R strand are observed. Similar results showing only Y*R·Y-type triplex formation for the 27mer and 36mer are given as supporting information (Figure S2), along with calculated masses for the range of possible nucleic acid species observed (Table S1).

Figure 1
figure 1

Triplex formation is favored for Y*R·Y-type GAA·TTC sequences. Native mass spectra of 18mer oligonucleotides (10 μM, 1 M ammonium acetate, pH 7.4) at a ratio of (a) 1:2 purine:pyrimidine strands, (b) 2:1 purine:pyrimidine strands, and (c) 2:1 purine:pyrimidine strands in the presence of 100 μM MgAc2. Spectra are annotated with charge states and an abbreviation describing the assemblies (single strand TFO, red S; duplex, blue D; triplex, green T; triplex dimer, black TD). First derivative profiles of UV-vis thermal melting curves at 260 nm for 18mer oligonucleotides (2 μM, 1 M ammonium acetate, pH 7.4) are shown in (d); 1:2 purine:pyrimidine strands (dashed line) and 2:1 purine:pyrimidine strands (solid line)

Divalent cations such as Mg2+ are commonly used to stabilize triplex formation, though this is not necessarily a simple electrostatic phenomenon, and primarily results in enhancement of the R*R·Y triplex formation [40]. Based on this, we further attempted to form R*R·Y triplex structures for the GAA·TTC sequences by addition of magnesium acetate. Again, there was no evidence by MS for R*R·Y triplex formation in the presence of Mg2+ for any strand lengths, with the 18mer case shown in in Figure 1C.

To ensure that the mass spectra were adequately reporting on solution-phase species and there was no gas-phase bias towards Y*R·Y-type triplexes, complementary UV-vis spectroscopy analysis was performed. DNA is known to thermally dissociate from the duplex state into its constitutive strands at high temperature (> 50 °C), described by the melting temperature (Tm), which corresponds to 50% dissociation. Tm is indicative of stability and can be determined by measuring the characteristic hyperchromicity of the UV absorbance upon strand dissociation, typically at 260 nm. UV-vis melting curves for triplexes therefore have two sigmoidal transitions corresponding to dissociation of the triplex strand followed by melting of the duplex. At 300 nm, hypochromicity results from deprotonation of N3-protonated cytosines involved in Hoogsteen base pairs in the triplex structures and is therefore specific to the triplex–duplex transition [41]. Figure 1D shows dA/dT plots from UV-vis melting curves at 260 nm for 18mer oligonucleotides at both 2:1 purine:pyrimidine and 1:2 purine:pyrimidine ratios. The triplex transition is clearly observed in the Y*R·Y case, with a Tm of 33 °C, whereas only a single duplex melting event is observed with excess R strand (duplex Tm = 65 °C). A summary of Tm values obtained throughout this study by UV-vis analysis is given in Table 1. Similar results were again observed for the longer oligonucleotide sequences, thereby confirming that Y*R·Y-type triplex formation is favored for these GAA·TTC sequences, and supporting the ability of ESI-MS to report on relevant solution-phase populations.

Table 1 Melting Points (Tm) of Y*R·Y DNA and RNA Assemblies as Determined by UV-vis Spectroscopy. Tm is taken from the Maxima of the First Derivative UV-vis Absorbance Plots, With an Error of ± 2 °C

Contradictory reports exist in the literature regarding the relative stability of parallel vs anti-parallel triplexes [42,43,44], likely due to a strong dependence on base sequence and length. It has been stated that, in general under normal laboratory conditions, parallel triplexes are expected to be more stable than anti-parallel ones and, accordingly, they are better characterized in the literature [45]. However, this situation can reverse in physiological environments, especially when the target duplexes contain a poly-G track [44, 46]. One important attribute of the GAA·TTC sequence that allows for alternative structures to form is mirror-repeat symmetry, which gives the sequence the ability to fold back and hydrogen bond with itself in various triplex conformations. Even considering just the GAA·TTC sequence of relevance to FRDA, conflicting evidence has been provided from in vitro studies for formation of both R*R·Y- [47] and Y*R·Y-type triplexes [10, 48]. However, our observation that these model GAA·TTC oligonucleotides strongly favor formation of parallel Y*R·Y-type assemblies provides supporting evidence that the pyrimidine strand may in fact interact with the repeat expansion and that this form may have currently underappreciated importance in the Friedrich’s ataxia condition.

We have found in explicit-solvent molecular dynamics simulations of analogous R*R·Y and Y*R·Y triplexes that the polypurine TFO backbone is significantly distorted in the R*R·Y triplex, with Hoogsteen hydrogen bonds failing to form between adenine bases, whereas no such distortion or diminishment of hydrogen bonding is observed for the polypyrimidine TFO [49]. These findings are consistent with the greater stability of the Y*R·Y triplex implied by the current experiments.

Triplex Stability Is Influenced by pH and Sequence Length

In the R*R·Y-type triplex, the TFO binds to the purine strand of the duplex with binding of A to A:T and G to G:C (Figure S1). This requires no base protonation and therefore is primarily pH-independent within a biologically relevant context. However, for the pyrimidine motif, N3 of cytosine must be protonated, and hence pyrimidine strand binding is favored at low pH given the pKa of cytosine single-stranded DNA lies between 5.2 and 5.5 [50]. Despite this, experimental evidence suggests that cytosines in central positions of the triplex are significantly protonated, even at neutral pH [50, 51]. To probe the stability of the GAA·TTC triplexes at acidic pH, we prepared analogous samples at pH 5.5 for UV-vis spectroscopy and IM-MS analysis.

Y*R·Y triplex structures were again readily observed by native MS for all sequence lengths, with the 18mer spectrum given as an example in Figure 2A. It can be noted that the proportion of free duplex and single-stranded species observed is significantly decreased at pH 5.5 compared with that at pH 7.4, supporting the notion that triplex formation is enhanced under these conditions. UV-vis melting analysis of these samples at 260 nm suggested a single hyperchromic transition (Figure S3). However, given the abundance of triplex observed by MS, we attributed this to significantly enhanced stability of the triplex at decreased pH causing Tm values for TFO dissociation to approach that of duplex melting, thereby resulting in overlap of signals in the dA/dT plots. Consequently, samples at pH 5.5 were subsequently interrogated at 300 nm to report specifically on the triplex-to-duplex transition. As seen from data presented in Table 1, duplex stability is unchanged at acidic pH for all strand lengths, with measured Tm’s showing no significant difference under both conditions. However, the Tm values for the triplex species are dramatically increased at pH 5.5, with the TFO dissociation now taking place at a temperature just below that of the duplex transition (Table 1 and Figure 2B). Consequently, acidic pH appears to impart both kinetic and thermodynamic advantages in the formation of the GAA·TTC triplexes of relevance in FRDA.

Figure 2
figure 2

Y*R·Y-type GAA·TTC triplex formation is enhanced at low pH. Native mass spectra of 18mer oligonucleotides (10 μM, 1 M ammonium acetate, pH 5.5) at a ratio of 2:1 purine:pyrimidine strands. Spectra are annotated with charge states and an abbreviation describing the assemblies (single strand TFO, red S; duplex, blue D; triplex, green T; triplex dimer, black TD). The first derivative profile of the UV-vis thermal melting curve at 300 nm is shown for this sample at 2 μM oligonucleotide concentration in (b)

We also show here that triplex stability is enhanced by increasing strand length. While the signal to noise ratio decreases in the mass spectra for longer oligonucleotides, likely due to difficulties in fully desalting these samples, comparison of Figure 1A with Supplementary Figures S2A and S2C reveals the proportion of free DNA dimer and TFO decreases as the oligonucleotide length increases, indicating triplex formation is favored at longer sequence lengths. This is not surprising given the increased number of Hoogsteen pairs forming between the TFO and central purine strand contributing to the stability of the structure; however, it again provides support for the ability of ESI-IM-MS to adequately report on solution-phase populations. Tm measurements obtained from supporting UV-vis analysis of the relevant triplex and duplex species are given in Table 1. A clear increase in Tm occurs between the 18mer and 27mer species, for both triplex and duplex melting. However, the subsequent increase in melting temperature from 27mer to 36mer species is proportionally smaller, suggesting that there are diminishing gains in stability with increasing length, likely due to entropic costs of forming a completely bound triplex as the TFO length increases.

Higher Order Triplex Structures

In order to further characterize the structures of the DNA assemblies, we recorded IM-MS spectra and extracted CCS values for the major species observed. Since the mass measurement alone confirms that we observe only Y*R·Y-type triplexes, which by the constraints of Watson–Crick and Hoogsteen bonding must combine in a parallel fashion, we use CCS measurement here simply to investigate if the triplex structures deviate from idealized geometry during gas-phase analysis.

The IM arrival time distributions (ATDs) for the predominant charge state for each of the 18mer, 27mer, and 36mer Y*R·Y triplex ions at pH 7.4 are shown in Figure 3A–C. Interestingly, a bimodal conformational distribution of ions is observed, with a minor population of the triplexes having a shorter ATD, which becomes more easily resolved for the longer strand lengths. This effect is, however, largely mitigated at acidic pH (Figure 3D) and indicates some form of structural collapse is taking place as result of decreased stability in the triplex structure. Since no further analyses were performed to verify if this is reflective of a solution-phase phenomenon or to characterize the structure, we refrain from speculating on the molecular detail of this more compact form, although this could be of interest for further study.

Figure 3
figure 3

Y*R·Y GAA·TTC triplex structure is largely retained in the gas-phase. Ion mobility arrival time distributions for predominant charge states of Y*R·Y GAA·TTC triplexes (10 μM, 1 M ammonium acetate, at ratio of 2:1 purine:pyrimidine strands) for (a) 18mer, (b) 27mer, and (c) 36mer oligonucleotides at pH 7.4, and (d) 36mer oligonucleotides at pH 5.5. Charge states are indicated in the figure. IM drift tube voltage was 1700 V

The centroid arrival times for the predominant charge states of the Y*R·Y triplexes were converted to CCS measurements using a multifield approach [52] performed in the vendor supplied software (IM-MS Browser, Agilent Technologies) for all sequence lengths, which are summarized in Table 2. A degree of variability is observed across the charge states, with more highly charged ions recording larger CCS values, consistent with analogous protein IM-MS studies [53]. This phenomenon was also observed by Arcella et al., who performed extensive molecular dynamic simulations complemented with IM-MS to characterize the conformational ensemble of parallel DNA triplexes in the gas-phase [29]. Their study demonstrated that, despite a range of conformational transitions being induced by vaporization, global descriptors such as overall shape were highly conserved, and as such, the gas-phase triplex maintains an excellent memory of the solution structure, with well-preserved helicity and native contacts. We generated models of idealized parallel triplex structures for the GAA·TTC Y*R·Y triplex to enable comparison with theoretical CCS values (Table 2). In all cases, good correlation is observed between experimental and theoretical CCS values for the predominant charge states, again supporting the notion that IM-MS analysis reflects results from gas-phase structures that are most likely derived from relevant solution structures for these Y*R·Y triplexes.

Table 2 Collision Cross Sections (CCS) of Y*R·Y Triplex Assemblies, Determined by Ion Mobility-Mass Spectrometry. Experimental CCS is Given for the Dominant Charge States as CCS ± Standard Deviation Obtained from Replicate Measurements. Theoretical CCS Represents the Average Value Determined from MD Simulation of Idealized Structures (Described in the Experimental Section)

To further investigate the influence of increasing charge (as a result of the electrospray ionization process) on the CCSs of the triplexes, additional theoretical structures were produced for the 27mer DNA triplex assembly having charge states consistent with experimental observation (8+, 9+, and 10+). This study was limited to the 27mer firstly due to the large computation demand of the MD studies and secondly since the 27mer gave an additional data point (3 major charge states were observed experimentally rather than just 2 for the 18mer and 36mer triplexes). It is, however, important to reiterate here that the formation of Y*R·Y triplexes is fundamentally dependent upon cytosine protonation (again due to the constraints of Hoogsteen hydrogen bonding). Consequently, when considering structures of these parallel triplexes, the cytosines are already protonated, and hence, additional charge is not expected to have any influence on the (CH+)·G binding interactions. Rather, increasing charge as a result of the electrospray process must reside elsewhere on the molecule and hence is expected to contribute to an electrostatic-induced unfolding.

It is not clear from current literature where any additional positive charge is likely to reside, which complicates prediction of appropriate theoretical structures. However, based on reported mononucleotide gas-phase proton affinities [54], adenine was selected here as the site for additional protonation. Structures of appropriate charge state were produced by a combination of TFO cytosine, TFO adenine, and phosphate group protonation, considering two distinct situations whereby the adenine containing the additional charge was either terminal or central on the TFO. Once again, the computation demand of the MD simulations prevented a more exhaustive search of protonation sites. Nevertheless, a consistent increase in theoretical CCS (Supplementary Table S2) is observed for the more highly charged species, which is directly comparable with experiment. This is particularly noticeable when the charge is centrally located on the triplex, consistent with a greater degree of structural perturbation as a result of electrostatic effects.

Also of note in the mass spectra recorded for the 1:2 purine:pyrimidine samples are the low-abundance signals corresponding to the “triplex dimers.” This is not likely an electrospray phenomenon given the concentration of analyte is in the low micromolar range, below that expected to give rise to non-specific association as a result of multiple analyte occupancy in the nano-spray droplets (simulations using Monte Carlo methods have shown that, at a concentration of 10 μM, the proportion of droplets containing multiple copies of analyte is essentially negligible [55, 56]). Furthermore, no such dimers are observed for duplex or single-stranded oligonucleotides. It is known that two triplex segments may associate to form a novel DNA structures called “sticky DNA,” though this has only been demonstrated to date in supercoiled plasmids with long triplex-forming repeat lengths [12, 13]. As a point of comparison, the CCS measured for the 18mer triplex dimer is significantly smaller than that of the 36mer triplex (Table 2), indicating this structure is more compact, and therefore, the strands are likely associating with an interface along the helix axis. While the biological relevance of the triplex dimer observed here is still unclear, in combination with other evidence for triplex dimerization [57], this data may point to a propensity for self-association of GAA rich Y*R·Y-type triplexes.

RNA as the Triplex-Forming Oligonucleotide

To investigate the properties of RNA–DNA hybrid triplexes, we synthesized the corresponding 18mer RNA TFO, (CUU)6, and once again investigated the formation and stability of triplex assemblies formed by IM-MS and UV-vis spectroscopy. As can be seen by IM-MS, the Y*R·Y-type hybrid triplex is readily observed at pH 7.4 from the RNA TFO, with a similar charge state distribution to that of the DNA analog centered around +7, in addition to some unbound DNA duplex (Figure 4A). Once again, decreasing pH to 5.5 appears to favor triplex formation, with increased relative abundance of signals corresponding to triplex and higher order structures noted in the mass spectrum (Figure S4). The ATDs for these ions are principally narrow and monomodal, indicative of a conformationally rigid structure (Figure 4C). The drift times for the RNA–DNA hybrid triplex ions are marginally larger than those of the DNA-only triplex indicating a similar overall size. Furthermore, a triplex dimer species is again observed for the RNA–DNA hybrid assembly, suggesting these hybrid structures are also prone to self-association.

Figure 4
figure 4

Triplex formation is favored for Y*R·Y-type RNA-DNA hybrid assemblies. Native mass spectra of 18mer oligonucleotides (10 μM, 1 M ammonium acetate, pH 7.4) at a ratio of (a) 1:1:1 purine DNA:pyrimidine DNA:pyrimidine RNA strands and (b) 1:2:1 purine DNA:pyrimidine DNA:pyrimidine RNA strands. Spectra are annotated with charge states and an abbreviation describing the assemblies (single strand TFO, red S; duplex, blue D; triplex, green T; triplex dimer, black TD). Ion mobility arrival time distribution for the 7+ charge state of the RNA–DNA hybrid triplexes at pH 7.4 is shown in (c). IM drift tube voltage was 1700 V

In a competition experiment in which equimolar RNA and DNA TFO strands are added to 18mer duplex, we find that the hybrid assembly is formed in preference, leaving mostly unbound DNA TFO (Figure 4B). This suggests a faster rate of formation and/or a greater stability of the RNA–DNA hybrid triplex. Supporting this, UV-vis melting curves show that Tm is increased for the RNA–DNA hybrid structure compared with DNA only, in both pH 5.5 and 7.4 conditions (Table 1), though it should be noted it was not possible to accurately measure Tm for duplex melting in the pH 5.5 case since the corresponding hypochromicity event was not resolvable from triplex dissociation. An estimate based on the peak width in the first derivative profile is 65–69 °C.

Despite a clear understanding that repression of frataxin expression is central to the etiology of FRDA, debate remains regarding the mechanism by which this repression is mediated, including the identity of the TFO in any associated triplex structures. Recently, hybrid RNA–DNA assemblies have been attributed to polymerase arrest at long GAA repeats [14], supporting an under-represented model that mRNA forms the TFO in FRDA triplexes. Furthermore, bioinformatic analysis of potential triplex-forming sequences has revealed enrichment of such sites at regulatory elements, mainly in promoters and enhancers, suggesting a wider potential role for RNA–DNA triplexes in transcriptional regulation [58]. Our results indicate that hybrid triplexes are more stable than their DNA-based counterparts, therefore lending further support to the hypothesis that mRNA TFOs may contribute to gene repression in FRDA.

Netropsin Interaction with DNA Triplexes

It has been shown that sequence-specific polyamides can alleviate the transcription inhibition associated with long GAA·TTC repeats in FRDA [59]. Netropsin is a well-characterized DNA minor groove-binding polyamide that serves as a model for the study of drug–DNA interactions. Our final experiments therefore utilized native MS to probe the interaction between GAA·TTC hybrid triplexes and netropsin. Addition of netropsin at a 0.5 M ratio to either 18mer DNA or RNA–DNA hybrid Y*R·Y triplexes resulted in minimal triplex dissociation, particularly for the RNA–DNA hybrid case (Figure 5). However, a netropsin adduct was observed for the DNA duplex, indicating exclusive binding of the ligand to free duplex structures.

Figure 5
figure 5

Netropsin favors binding to duplex DNA structures. Native mass spectra of 18mer oligonucleotides (10 μM, 1 M ammonium acetate, pH 7.4) in the presence of 5 μM netropsin at a ratio of (a) 1:2 purine DNA:pyrimidine DNA strands and (b) 1:1:1 purine DNA:pyrimidine DNA: pyrimidine RNA strands. Spectra are annotated with charge states and an abbreviation describing the assemblies (single strand TFO, red S; duplex, blue D; triplex, green T; triplex dimer, black TD; netropsin bound, *)

While it is generally accepted in the literature that netropsin binding thermally destabilizes triplex structures while stabilizing the duplex [17], contradictory reports exist regarding the relative binding affinities to triplex and duplex (e.g., [15, 17]) Here, we find evidence that netropsin binding to the duplex is more favorable, and hence, sub-stoichiometric concentrations would be insufficient for extensive triplex destabilization in the FRDA context. This may also provide a mechanistic explanation for the fact that netropsin is unable to increase transcription through the GAA repeats found in Friedreich’s ataxia in in vivo models [60]. Although netropsin interactions with DNA triplexes are widely reported, to our knowledge, this is the first investigation of interactions with an RNA–DNA hybrid form.

Conclusions

IM-MS has enabled us to probe the structures and stabilities of GAA·TTC DNA and RNA triplex assemblies of 18, 27, and36 bases in length. We have shown that these gas-phase measurements are highly reflective of the triplex solution-phase properties, that Y*R·Y triplex structures are formed for these sequences in preference to R*R·Y-type assemblies, and that the triplexes have a propensity to self-associate. Consistent with previous work, triplex stability is found to increase with oligonucleotide length and acidic pH. However, while it is often suggested that the pH dependence of Y*R·Y triplexes is a limiting factor in triplex formation due to the need for cytosine protonation, it is clear from this work that these triplexes can form, even at mildly basic pH. We also demonstrate that more stable triplex structures are formed from an RNA-based TFO, which may imply these are the likely triplex structures formed in FRDA, but also have implications when considering general design of TFOs with the goal of altering gene expression. Our results therefore provide novel chemical and biological insights regarding the structure and function of unusual DNA conformations in FRDA, and for development of sequence-specific gene targeting tools.